mbanon / fastspell
Targetted language identifier, based on FastText and Hunspell.
☆34Updated 2 weeks ago
Alternatives and similar repositories for fastspell:
Users that are interested in fastspell are comparing it to the libraries listed below
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆70Updated 10 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated 3 weeks ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆49Updated last month
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆154Updated 8 months ago
- OpusFilter - Parallel corpus processing toolkit☆104Updated this week
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Python Finite-State Toolkit☆51Updated this week
- ☆83Updated 2 months ago
- Bicleaner fork that uses neural networks☆39Updated 7 months ago
- ☆45Updated 7 months ago
- Augmenty is an augmentation library based on spaCy for augmenting texts.☆151Updated 9 months ago
- Transform TMX to text☆28Updated 2 years ago
- 🧪 Cutting-edge experimental spaCy components and features☆96Updated 10 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- Faster, modernized fork of the language identification tool langid.py☆53Updated 3 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆21Updated 3 weeks ago
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆118Updated 3 months ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated 2 weeks ago
- Curriculum training☆16Updated last month
- Library for fast text representation and classification.☆28Updated last year
- Source code for the Apple reproduction☆31Updated 3 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆105Updated 2 weeks ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆153Updated 3 months ago
- Recon NER, Debug and correct annotated Named Entity Recognition (NER) data for inconsistencies and get insights on improving the quality …☆106Updated last year
- 80x faster and 95% accurate language identification with Fasttext☆147Updated last year
- spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to i…☆46Updated 10 months ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆35Updated 2 years ago
- Resource and Tool for Writing System Identification -- LREC 2024☆13Updated 8 months ago