jacksonllee / iso639Links
ISO 639 language codes
☆47Updated last week
Alternatives and similar repositories for iso639
Users that are interested in iso639 are comparing it to the libraries listed below
Sorting:
- Next-generation Punkt sentence boundary detection with zero dependencies☆24Updated 3 months ago
- A Python library for working with and comparing language codes.☆353Updated 6 months ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆111Updated 5 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
- Python Finite-State Toolkit☆60Updated last week
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆76Updated 3 weeks ago
- Cython wrapper on Hunspell Dictionary☆66Updated last year
- A python package to simulate typographical errors.☆38Updated last year
- Pythonic search engine based on PyLucene.☆131Updated 3 weeks ago
- Faster, modernized fork of the language identification tool langid.py☆61Updated last year
- Hy-phen-ation made easy☆217Updated 9 months ago
- ☆175Updated 7 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆179Updated 5 months ago
- A Python implementation of Lunr.js 🌖☆201Updated 8 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆52Updated last month
- Accurately find/replace/remove emojis in text strings☆162Updated last year
- Confection: the sweetest config system for Python☆191Updated 2 weeks ago
- Targetted language identifier, based on FastText and Hunspell.☆37Updated 2 months ago
- A python true casing utility that restores case information for texts☆89Updated 3 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated 11 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆34Updated 2 months ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆65Updated last week
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- Check for multiple patterns in a single string at the same time: a fast Aho-Corasick algorithm for Python☆215Updated 2 weeks ago
- Robust and Fast tokenizations alignment library for Rust and Python https://tamuhey.github.io/tokenizations/☆193Updated 2 years ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated last year
- A python package for grapheme aware string handling☆114Updated 3 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆75Updated 7 months ago
- 🌸 Train floret vectors☆18Updated 2 years ago
- ISO 639 library for Python☆35Updated last year