google / cld3
☆830Updated last year
Alternatives and similar repositories for cld3:
Users that are interested in cld3 are comparing it to the libraries listed below
- Compact Language Detector 2☆862Updated 3 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆151Updated last year
- Bitextor generates translation memories from multilingual websites☆292Updated 6 months ago
- Heuristic based boilerplate removal tool☆769Updated 2 months ago
- NeuSpell: A Neural Spelling Correction Toolkit☆693Updated last year
- Tools to download and cleanup Common Crawl data☆1,006Updated 2 years ago
- ☆170Updated last month
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆731Updated 8 months ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆377Updated 5 months ago
- A python module for English lemmatization and inflection.☆268Updated last year
- Python bindings for cld3☆27Updated last year
- ☆505Updated last year
- A python tool for evaluating the quality of sentence embeddings.☆2,106Updated last year
- Language-Agnostic SEntence Representations☆3,637Updated last year
- Python port of Moses tokenizer, truecaser and normalizer☆495Updated 11 months ago
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆824Updated 2 weeks ago
- Port of Google's language-detection library to Python.☆1,793Updated 2 months ago
- Fast Neural Machine Translation in C++☆1,321Updated last year
- Fast BPE☆671Updated 10 months ago
- Simple, fast unsupervised word aligner☆752Updated 2 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,213Updated 7 months ago
- Language Detection with Infinity-gram☆229Updated 9 years ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆375Updated 2 years ago
- Modern spell checking library - accurate, fast, multi-language☆635Updated 8 months ago
- 🦆 Contextually-keyed word vectors☆1,650Updated 2 weeks ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆360Updated last year
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆506Updated 5 years ago
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆316Updated 2 months ago
- ☆1,273Updated 2 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated 3 months ago