mbanon/fastspell

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mbanon/fastspell)

mbanon / fastspell

Targetted language identifier, based on FastText and Hunspell.

☆38

Alternatives and similar repositories for fastspell

Users that are interested in fastspell are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bitextor / bifixer
View on GitHub
Tool to fix bitexts and tag near-duplicates for removal
☆35Sep 4, 2025Updated 10 months ago
paracrawl / keops
View on GitHub
Tool for manual evaluation of parallel sentences.
☆15Jan 26, 2026Updated 5 months ago
hplt-project / OpusCleaner
View on GitHub
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
☆58Feb 3, 2026Updated 5 months ago
laurieburchell / open-lid-dataset
View on GitHub
Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)
☆77Apr 1, 2025Updated last year
loomchild / segment
View on GitHub
Program used to split text into segments
☆28Oct 27, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
hiredscorelabs / seqtolang
View on GitHub
Multi-Langauge Identification
☆28Jul 25, 2024Updated last year
hplt-project / data-analytics-tool
View on GitHub
HPLT Analytics
☆15Updated this week
bitextor / warc2text
View on GitHub
Extracts plain text, language identification and more metadata from WARC records
☆23Apr 16, 2026Updated 3 months ago
gidim / Babler
View on GitHub
Data Collection System For NLP/Speech Recognition
☆25Apr 20, 2021Updated 5 years ago
bucky2177 / dRiftDM
View on GitHub
dRiftDM
☆15Jun 6, 2026Updated last month
UniversalDependencies / UD_German-HDT
View on GitHub
☆14May 29, 2026Updated last month
crate / crate-docs-theme
View on GitHub
A Sphinx theme for the CrateDB documentation.
☆22Jul 6, 2026Updated 2 weeks ago
BPI-SINOVOIP / BPI-M2-bsp
View on GitHub
Supports BananaPi BPI -M2 (Kernel3.3)
☆11Nov 3, 2016Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
browsermt / marian-dev
View on GitHub
Fast Neural Machine Translation in C++ - development repository
☆23May 12, 2024Updated 2 years ago
gonglinyuan / metro_t0
View on GitHub
Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)
☆22Nov 1, 2023Updated 2 years ago
ansrivas / pylogging
View on GitHub
A small wrapper around python logging module which can easily format and write logs to file.
☆12Jan 9, 2023Updated 3 years ago
NathanGodey / headless-lm
View on GitHub
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆29Apr 17, 2024Updated 2 years ago
mainlp / germanic-lrl-corpora
View on GitHub
Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…
☆28Feb 16, 2026Updated 5 months ago
ajavageek / rust-jvm
View on GitHub
☆14Sep 6, 2023Updated 2 years ago
purescript-contrib / purescript-these
View on GitHub
Data type isomorphic to α ∨ β ∨ (α ∧ β)
☆14Apr 27, 2022Updated 4 years ago
purescript-contrib / purescript-concurrent-queues
View on GitHub
An unbounded and bounded queue for concurrent access.
☆10Apr 27, 2022Updated 4 years ago
webis-de / webis-tldr-17-corpus
View on GitHub
Code for constructing TLDR corpus from Reddit dataset
☆27Nov 23, 2021Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
jmccrae / yuzu
View on GitHub
Micro-framework for publishing linked data
☆11Aug 1, 2017Updated 8 years ago
Helsinki-NLP / OpusTools
View on GitHub
☆83Jun 24, 2026Updated 3 weeks ago
bitextor / bicleaner-ai
View on GitHub
Bicleaner fork that uses neural networks
☆40Feb 23, 2026Updated 4 months ago
purescript-contrib / purescript-precise
View on GitHub
A huge number library for Purescript with emphasis on correctness.
☆12Apr 27, 2022Updated 4 years ago
viz-rs / radix-tree
View on GitHub
A radix tree implementation
☆15Sep 22, 2022Updated 3 years ago
AxelSorensenDev / Eevee
View on GitHub
An Easy Annotation Tool for Natural Language Processing
☆12May 17, 2024Updated 2 years ago
orchid-hybrid / microKanren-sagittarius
View on GitHub
microKanren sagittarius/larceny
☆11Jun 13, 2015Updated 11 years ago
adbar / py3langid
View on GitHub
Faster, modernized fork of the language identification tool langid.py
☆63Nov 22, 2024Updated last year
maciejgryka / regex_help
View on GitHub
Get a computer to write regex for you. A front-end for grex (https://github.com/pemistahl/grex).
☆11Sep 8, 2022Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
bsolomon1124 / pycld3
View on GitHub
Python3 bindings for the Compact Language Detector v3 (CLD3)
☆154Apr 19, 2026Updated 3 months ago
robertostling / eflomal
View on GitHub
Efficient Low-Memory Aligner
☆148Jan 15, 2025Updated last year
Riccorl / sense-embedding
View on GitHub
BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText
☆10Sep 3, 2019Updated 6 years ago
jfinkels / hyphenate
View on GitHub
Hyphenation of English words
☆13Dec 21, 2016Updated 9 years ago
WebSpellChecker / wproofreader
View on GitHub
WProofreader software development kit (SDK) offers multilingual spelling & grammar check API and JavaScript libraries for rich text edito…
☆13Jun 25, 2026Updated 3 weeks ago
sinaahmadi / wergor
View on GitHub
Rule-based Kurdish Transliterator
☆11May 3, 2024Updated 2 years ago
fpdetective / modCrawler
View on GitHub
Crawler based on a modified browser to detect online tracking.
☆11Jul 19, 2023Updated 3 years ago