google / cld3
☆818Updated last year
Alternatives and similar repositories for cld3:
Users that are interested in cld3 are comparing it to the libraries listed below
- Compact Language Detector 2☆858Updated 3 years ago
- Bitextor generates translation memories from multilingual websites☆292Updated 5 months ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,211Updated 6 months ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆302Updated 7 months ago
- Modern spell checking library - accurate, fast, multi-language☆634Updated 7 months ago
- Fast Neural Machine Translation in C++☆1,315Updated last year
- Simple, fast unsupervised word aligner☆751Updated 2 years ago
- Heuristic based boilerplate removal tool☆765Updated last month
- Python port of Moses tokenizer, truecaser and normalizer☆493Updated 10 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- Open neural machine translation models and web services☆680Updated 4 months ago
- Tools to download and cleanup Common Crawl data☆999Updated last year
- Unsupervised text tokenizer focused on computational efficiency☆965Updated last year
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆505Updated 5 years ago
- ☆169Updated 3 weeks ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆378Updated 4 months ago
- Training open neural machine translation models☆357Updated last month
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆845Updated 7 months ago
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆824Updated 3 weeks ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆157Updated 10 months ago
- Language-Agnostic SEntence Representations☆3,632Updated 11 months ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆161Updated 4 years ago
- Python bindings for cld3☆27Updated last year
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆310Updated last year
- A neural word aligner based on multilingual BERT☆346Updated 3 years ago
- ☆501Updated last year
- Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.☆566Updated 8 months ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆726Updated last year
- Sentence aligner☆112Updated 3 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆361Updated last year