google / cld3
☆811Updated last year
Alternatives and similar repositories for cld3:
Users that are interested in cld3 are comparing it to the libraries listed below
- Compact Language Detector 2☆855Updated 3 years ago
- Heuristic based boilerplate removal tool☆764Updated last month
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- ☆168Updated this week
- Bitextor generates translation memories from multilingual websites☆292Updated 4 months ago
- Python port of Moses tokenizer, truecaser and normalizer☆492Updated 10 months ago
- Fast Neural Machine Translation in C++☆1,305Updated last year
- Article extraction benchmark: dataset and evaluation scripts☆309Updated 11 months ago
- Training open neural machine translation models☆356Updated last week
- ☆501Updated last year
- 🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy☆309Updated last year
- Tools to download and cleanup Common Crawl data☆993Updated last year
- Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing☆557Updated 4 months ago
- A neural word aligner based on multilingual BERT☆344Updated 3 years ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆733Updated 7 months ago
- Process Common Crawl data with Python and Spark☆422Updated last month
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆505Updated 5 years ago
- High-accuracy NLP parser with models for 11 languages.☆880Updated 3 years ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆377Updated 4 months ago
- Simple, fast unsupervised word aligner☆750Updated 2 years ago
- Improved Sentence Alignment in Linear Time and Space☆169Updated 2 years ago
- Fast Neural Machine Translation in C++ - development repository☆268Updated 5 months ago
- Python bindings for cld3☆27Updated last year
- Language Detection with Infinity-gram☆231Updated 9 years ago
- Sentence aligner☆112Updated 3 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,205Updated 5 months ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆412Updated last month
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆155Updated 9 months ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- A sentence segmenter that actually works!☆305Updated 4 years ago