jacksonllee / iso639
ISO 639 language codes
☆41Updated 2 months ago
Alternatives and similar repositories for iso639:
Users that are interested in iso639 are comparing it to the libraries listed below
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆70Updated 2 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated 2 months ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated 4 months ago
- A Python library for working with and comparing language codes.☆346Updated 4 months ago
- A python module to reduce Unicode to a 'good enough' ASCII representation (outdated Github copy)☆40Updated 14 years ago
- python package for calculating famous measures in computational linguistics☆13Updated 5 months ago
- Rust python bindings for symspell☆19Updated last year
- Fast and accurate natural language detection. Detector written in Python. Nito-ELD, ELD.☆17Updated last year
- Python Finite-State Toolkit☆54Updated last month
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆24Updated 4 months ago
- Multilingual syllable annotation pipeline component for spacy☆39Updated 2 years ago
- Gamma Agreement in Python☆43Updated last year
- Cython wrapper on Hunspell Dictionary☆66Updated 9 months ago
- The fastest FlashText library for Python☆20Updated 9 months ago
- fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…☆39Updated 2 years ago
- A python package to simulate typographical errors.☆33Updated last year
- ISO 639 library for Python☆32Updated 7 months ago
- universal tokenizer☆17Updated 3 years ago
- Rust-based Python wrapper for duckling library in Haskell☆25Updated 4 years ago
- A accurate multilingual word aligner based on LaBSE☆21Updated last year
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated 2 months ago
- OpusFilter - Parallel corpus processing toolkit☆104Updated 3 weeks ago
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆80Updated 7 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Fast syllable estimation library based on pattern matching.☆37Updated last month
- A sentence segmentation library with wide language support optimized for speed and utility.☆61Updated 7 months ago
- Source code for the Apple reproduction☆32Updated 3 years ago
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆106Updated 2 months ago