mbanon / fastspellLinks
Targetted language identifier, based on FastText and Hunspell.
โ37Updated last month
Alternatives and similar repositories for fastspell
Users that are interested in fastspell are comparing it to the libraries listed below
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)โ75Updated 6 months ago
- ๐ธ GlotWeb: Web Indexing for Low-Resource Languages -- under construction.โ15Updated 2 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.โ52Updated 3 weeks ago
- Faster, modernized fork of the language identification tool langid.pyโ59Updated 11 months ago
- Extracts plain text, language identification and more metadata from WARC recordsโ23Updated last month
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.โ32Updated 7 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.โ156Updated last year
- Sentence transformers models for SpaCyโ107Updated 2 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.โ159Updated last year
- ๐ Resource and Tool for Writing System Identification -- LREC 2024โ20Updated last year
- OpusFilter - Parallel corpus processing toolkitโ110Updated 3 weeks ago
- โ78Updated 2 months ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2โฆโ68Updated 2 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiencyโ178Updated 4 months ago
- Python Finite-State Toolkitโ58Updated 2 weeks ago
- 80x faster and 95% accurate language identification with Fasttextโ161Updated last year
- Tool to fix bitexts and tag near-duplicates for removalโ33Updated last month
- BERT and ELECTRA models trained on Europeana Newspapersโ38Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidataโ164Updated 2 years ago
- ๐ฅ Use Hugging Face text and token classification pipelines directly in spaCyโ63Updated last year
- ๐งช Cutting-edge experimental spaCy components and featuresโ102Updated last year
- These are lists for a variety of languages containing words that are distinctive to each language.โ38Updated 3 years ago
- Text tokenization and sentence segmentation (segtok v2)โ206Updated 3 years ago
- A sentence segmentation library with wide language support optimized for speed and utility.โ68Updated last week
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to iโฆโ46Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)โ154Updated 2 years ago
- Library for fast text representation and classification.โ31Updated last year
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.โ71Updated 2 years ago
- NTREX -- News Test References for MT Evaluationโ85Updated last year
- Seed Machine Translation Dataโ33Updated 11 months ago