google / cld3Links
☆828Updated 2 years ago
Alternatives and similar repositories for cld3
Users that are interested in cld3 are comparing it to the libraries listed below
Sorting:
- Compact Language Detector 2☆863Updated 4 years ago
- Heuristic based boilerplate removal tool☆780Updated 3 months ago
- Bitextor generates translation memories from multilingual websites☆293Updated 6 months ago
- Article extraction benchmark: dataset and evaluation scripts☆316Updated last year
- Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm…☆827Updated last month
- ☆171Updated 2 months ago
- Port of Google's language-detection library to Python.☆1,804Updated 3 months ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆375Updated 2 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆153Updated last year
- The most accurate natural language detection library for Python, suitable for short text and mixed-language text☆1,376Updated last week
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆507Updated 5 years ago
- 🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.☆851Updated 9 months ago
- Python bindings for cld3☆27Updated last year
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆733Updated 9 months ago
- ☆508Updated last year
- Single-document unsupervised keyword extraction☆1,731Updated this week
- Sentence aligner☆113Updated 4 years ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆379Updated 6 months ago
- A python module for English lemmatization and inflection.☆268Updated last year
- NLP, before and after spaCy☆2,225Updated last year
- Multilingual text (NLP) processing toolkit☆2,343Updated last year
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆361Updated last year
- python package to calculate readability statistics of a text object - paragraphs, sentences, articles.☆1,288Updated 2 weeks ago
- Stand-alone language identification system☆2,386Updated 5 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆494Updated last year
- 🦆 Contextually-keyed word vectors☆1,653Updated last month
- Modern spell checking library - accurate, fast, multi-language☆638Updated 9 months ago
- Elasticsearch with BERT for advanced document search.☆899Updated 2 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated 4 months ago
- This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)☆753Updated 6 years ago