uhermjakob / utoken
universal tokenizer
☆15Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for utoken
- Efficient teacher-student models and scripts to make them☆48Updated 11 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated this week
- Bilingual sentence similarity classifier using Tensorflow☆19Updated 5 years ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆21Updated 2 years ago
- bilingual dictionary extractor from parallel corpora☆22Updated 10 years ago
- Transform TMX to text☆29Updated last year
- Translation demonstrator☆27Updated 4 years ago
- A tiny BERT for low-resource monolingual models☆29Updated last month
- Multilingual Open Text☆25Updated 3 weeks ago
- common language and mathematics processing algorithms, in Rust☆25Updated 7 months ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆45Updated 6 months ago
- Extracts plain text, language identification and more metadata from WARC records☆20Updated 3 months ago
- PANiC - PAraphrasing Noun-Compounds☆15Updated 6 years ago
- ☆16Updated this week
- ParaNames: A multilingual resource for parallel names☆30Updated 6 months ago
- English Resource Grammar☆18Updated 3 months ago
- Finds linguistic patterns effortlessly☆33Updated last year
- A python module to process data for Frame Semantic Parsing☆23Updated 4 years ago
- Python Finite-State Toolkit☆45Updated last week
- Efficiently computing & storing token n-grams from large corpora☆15Updated last month
- ☆67Updated 3 months ago
- 💫 A spaCy package for Yohei Tamura's Rust tokenizations library☆27Updated last year
- Lexical data at Unicode☆66Updated 2 months ago
- Library for fast text representation and classification.☆28Updated 10 months ago
- Scrapes some Finnish word definitions from English Wiktionary.☆7Updated last year
- 🐍🍑 Python 3 library for managing, annotating, and converting natural language corpuses using popular formats (CoNLL, ELAN, Praat, CSV, …☆18Updated 4 months ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Updated 3 years ago
- MAMMOTH: MAssively Multilingual Modular Open Translation @ Helsinki☆22Updated this week
- Metadata Extractor & Loader (MEL) ■ The NLP-NER Toolkit (TNNT)☆22Updated last year
- Python library to work with ConceptNet offline☆10Updated last year