google-research-datasets / uninum
A database of number names for 186 languages, locales, and scripts
☆66Updated last year
Related projects: ⓘ
- Automatic extraction of edited sentences from text edition histories.☆80Updated 2 years ago
- An open-access corpus of conversational bilingual speech in Cantonese and English☆40Updated 2 years ago
- Tools for extracting parallel corpora from article titles across languages in Wikipedia☆72Updated 9 years ago
- Demonstration of the results in "Text Normalization using Memory Augmented Neural Networks", Authors: Subhojeet Pramanik, Aman Hussain☆60Updated 5 years ago
- We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…☆80Updated 3 years ago
- A toolkit for producing n-gram language models. The highlights are the implementation of Kneser-Ney growing and revised Kneser pruning me…☆40Updated 2 weeks ago
- bilingual dictionary extractor from parallel corpora☆21Updated 10 years ago
- ☆42Updated 6 years ago
- Neural macine translation soft alignment visualisations for web and command line☆72Updated 3 years ago
- ☆67Updated last month
- Spoken Language Translation System☆14Updated 5 years ago
- Corpus preprocessing☆95Updated 6 months ago
- Examples, tutorials and use cases for Marian, including our WMT-2017/18 baselines.☆78Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆29Updated last month
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆45Updated 4 months ago
- Python implementation of Levenshtein distance and Levenshtein automata matching☆27Updated 5 years ago
- ☆11Updated 7 years ago
- A guide to building language technology in new languages.☆57Updated 2 years ago
- LSTM Language Model with Subword Units Input Representations☆43Updated 3 years ago
- Deep-learning based sentence auto-segmentation from unstructured text w/o punctuation☆37Updated 7 years ago
- ☆12Updated 8 years ago
- ☆21Updated 4 years ago
- Efficient Low-Memory Aligner☆135Updated 2 weeks ago
- A program to choose transfer languages for cross-lingual learning☆70Updated last year
- Translation Error Rate (TER)☆43Updated 6 years ago
- Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshgh…☆24Updated last year
- Project OCELoT: an Open, Collaborative Evaluation Leaderboard of Translations☆20Updated 2 months ago
- Baseline models, training scripts, and instructions on how to reproduce our results for our state-of-art grammar correction system from M…☆69Updated 5 years ago
- A collection of basic python modules for spoken natural language processing☆56Updated 4 years ago
- A language model-based approach to Grammatical Error Correction for English that uses minimal annotated data.☆49Updated 5 years ago