mbanon / fastspellLinks
Targetted language identifier, based on FastText and Hunspell.
☆36Updated 4 months ago
Alternatives and similar repositories for fastspell
Users that are interested in fastspell are comparing it to the libraries listed below
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆74Updated 2 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated last week
- 🕸 GlotWeb: Web Indexing for Low-Resource Languages -- under construction.☆13Updated 2 months ago
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated last year
- A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.☆106Updated last year
- These are lists for a variety of languages containing words that are distinctive to each language.☆38Updated 3 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 5 years ago
- 🧪 Cutting-edge experimental spaCy components and features☆99Updated last year
- Python Finite-State Toolkit☆56Updated this week
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆156Updated last year
- ☆74Updated 3 months ago
- 🖋 Resource and Tool for Writing System Identification -- LREC 2024☆16Updated last year
- OpusFilter - Parallel corpus processing toolkit☆104Updated this week
- Transform TMX to text☆27Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- A High-level Library for Named Entity Recognition in Python.☆24Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 4 months ago
- Extracts plain text, language identification and more metadata from WARC records☆22Updated 3 months ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆162Updated 2 years ago
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆138Updated 3 weeks ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆157Updated last year
- Easier Automatic Sentence Simplification Evaluation☆162Updated last year
- GLADIS: A General and Large Acronym Disambiguation Benchmark (EACL 23)☆16Updated last year
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆67Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated 3 months ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- Language detection using Spacy and Fasttext☆55Updated last year
- Cython wrapper on Hunspell Dictionary☆66Updated last year