jacksonllee / iso639
ISO 639 language codes
☆34Updated 4 months ago
Related projects: ⓘ
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆66Updated 2 weeks ago
- A file utility for accessing both local and remote files through a unified interface.☆36Updated last month
- Rust python bindings for symspell☆18Updated 8 months ago
- Python Finite-State Toolkit☆39Updated last month
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated last month
- Rust-based Python wrapper for duckling library in Haskell☆24Updated 3 years ago
- ISO 639 library for Python☆32Updated 2 weeks ago
- fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…☆38Updated last year
- A python package for grapheme aware string handling☆104Updated 2 years ago
- An extension package of 🤗 Datasets that provides support for executing arbitrary SQL queries on HF datasets☆31Updated 7 months ago
- 🧬 A VS Code extension for annotating data with Prodigy☆30Updated 2 years ago
- Targetted language identifier, based on FastText and Hunspell.☆27Updated 2 weeks ago
- Source code for the Apple reproduction☆30Updated 3 years ago
- A python module to reduce Unicode to a 'good enough' ASCII representation (outdated Github copy)☆36Updated 13 years ago
- an experimental implementation of Burrow's delta in Python 3☆20Updated 2 years ago
- Confection: the sweetest config system for Python☆175Updated 3 months ago
- Python module for accessing databases using the ODBC API.☆12Updated 2 months ago
- A python package to simulate typographical errors.☆30Updated 9 months ago
- ☆29Updated 2 years ago
- The Seshat audio annotation management platform☆13Updated 3 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆21Updated this week
- A pure Python Levenshtein implementation that's not freaking GPL'd.☆97Updated last year
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Updated 4 months ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆17Updated 2 years ago
- Check for multiple patterns in a single string at the same time: a fast Aho-Corasick algorithm for Python☆160Updated 2 weeks ago
- ☆70Updated last year
- Source code and data for Like a Good Nearest Neighbor☆28Updated 7 months ago
- Language detection using Spacy and Fasttext☆53Updated 9 months ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆31Updated 5 years ago
- Repo to hold code and track issues for the collection of permissively licensed data☆22Updated 3 weeks ago