google / cld3
☆774Updated last year
Related projects: ⓘ
- Compact Language Detector 2☆836Updated 3 years ago
- Fast Neural Machine Translation in C++☆1,225Updated last year
- Modern spell checking library - accurate, fast, multi-language☆605Updated 3 weeks ago
- Heuristic based boilerplate removal tool☆717Updated 4 months ago
- Bitextor generates translation memories from multilingual websites☆287Updated 3 months ago
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆482Updated 4 years ago
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆782Updated last month
- Training open neural machine translation models☆321Updated last month
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆724Updated last month
- Simple, fast unsupervised word aligner☆732Updated 2 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,177Updated 6 months ago
- Port of Google's language-detection library to Python.☆1,709Updated 7 months ago
- Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/☆696Updated 6 months ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆358Updated last week
- NeuSpell: A Neural Spelling Correction Toolkit☆662Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆148Updated last year
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆791Updated 2 weeks ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆276Updated 2 weeks ago
- Language-Agnostic SEntence Representations☆3,576Updated 4 months ago
- Unsupervised text tokenizer focused on computational efficiency☆953Updated 5 months ago
- 🦆 Contextually-keyed word vectors☆1,617Updated 6 months ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆364Updated last year
- Tools to download and cleanup Common Crawl data☆961Updated last year
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,040Updated last month
- Multilingual text (NLP) processing toolkit☆2,307Updated 10 months ago
- Python port of Moses tokenizer, truecaser and normalizer☆486Updated 3 months ago
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆160Updated 3 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆347Updated 10 months ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,339Updated 3 months ago
- Stand-alone language identification system☆2,297Updated 4 years ago