mbanon / fastspellLinks
Targetted language identifier, based on FastText and Hunspell.
โ38Updated 4 months ago
Alternatives and similar repositories for fastspell
Users that are interested in fastspell are comparing it to the libraries listed below
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)โ74Updated 9 months ago
- ๐ธ GlotWeb: Web Indexing for Low-Resource Languages -- under construction.โ17Updated 4 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.โ55Updated 3 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiencyโ181Updated 7 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.โ35Updated 9 months ago
- Python Finite-State Toolkitโ60Updated last week
- Extracts plain text, language identification and more metadata from WARC recordsโ23Updated 3 months ago
- These are lists for a variety of languages containing words that are distinctive to each language.โ40Updated 3 years ago
- ๐ฅ Use Hugging Face text and token classification pipelines directly in spaCyโ63Updated last year
- 80x faster and 95% accurate language identification with Fasttextโ163Updated last year
- ๐ Resource and Tool for Writing System Identification -- LREC 2024โ21Updated last week
- Faster, modernized fork of the language identification tool langid.pyโ61Updated last year
- OpusFilter - Parallel corpus processing toolkitโ113Updated 2 weeks ago
- ๐งช Cutting-edge experimental spaCy components and featuresโ105Updated last year
- Sentence transformers models for SpaCyโ109Updated 2 years ago
- Library for fast text representation and classification.โ31Updated last year
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.โ159Updated last year
- A sentence segmentation library with wide language support optimized for speed and utility.โ77Updated 3 weeks ago
- Searching in-memory corpus with Corpus Query Language (CQL)โ19Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2โฆโ70Updated 2 years ago
- Seed Machine Translation Dataโ33Updated last year
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.โ112Updated 7 months ago
- Tool to fix bitexts and tag near-duplicates for removalโ34Updated 4 months ago
- Download and load spaCy models on-the-flyโ15Updated 2 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidataโ169Updated 3 years ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.โ256Updated 3 years ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)โ155Updated 2 years ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.โ156Updated last year
- Source code for the Apple reproductionโ32Updated 4 years ago
- My NER Experiments with ModernBERT and Ettinโ26Updated 5 months ago