c++ mosestokenizer
☆18Mar 13, 2024Updated last year
Alternatives and similar repositories for fast-mosestokenizer
Users that are interested in fast-mosestokenizer are comparing it to the libraries listed below
Sorting:
- Python package to augment multilingual data☆15Feb 15, 2023Updated 3 years ago
- scripts used for SMT system submitted to WMT 2014☆12Apr 30, 2017Updated 8 years ago
- A library for data streaming and augmentation☆21May 5, 2025Updated 10 months ago
- Unsupervised multilingual sentence segmentation.☆21Feb 26, 2021Updated 5 years ago
- ☆23Nov 6, 2022Updated 3 years ago
- ☆24Mar 13, 2020Updated 5 years ago
- Small utility to monitor fairseq training in tensorboard☆21Apr 28, 2019Updated 6 years ago
- A library of translation-based text similarity measures☆25Dec 11, 2023Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆162Sep 18, 2025Updated 5 months ago
- YiSi: A Semantic Machine Translation Evaluation Metric for Evaluating Languages with Different Levels of Available Resources☆26May 28, 2019Updated 6 years ago
- Tool to fix bitexts and tag near-duplicates for removal☆34Sep 4, 2025Updated 6 months ago
- OpusFilter - Parallel corpus processing toolkit☆115Feb 11, 2026Updated 3 weeks ago
- Tools for formatting WMT hypothesis and test sets in XML☆27Apr 18, 2025Updated 10 months ago
- Matrix tools for building and inspecting latent spaces☆27Aug 19, 2018Updated 7 years ago
- ICU based universal language tokenizer☆34Jan 19, 2022Updated 4 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆495Feb 6, 2026Updated last month
- TER-plus Machine Translation metric.☆31May 23, 2022Updated 3 years ago
- ☆14Aug 20, 2025Updated 6 months ago
- Human evaluation results and translation output for the Translator Human Parity Data release☆37Mar 19, 2018Updated 7 years ago
- Setup wifi through sound☆40May 3, 2022Updated 3 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- Corpus preprocessing☆100Mar 16, 2024Updated last year
- ☆14May 14, 2019Updated 6 years ago
- Russian phonetical transcription☆11Nov 19, 2025Updated 3 months ago
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- ☆29Dec 20, 2025Updated 2 months ago
- Dynamic config system based on python classes☆12Jan 27, 2023Updated 3 years ago
- Nanos klib for NVIDIA GPUs☆14Mar 25, 2025Updated 11 months ago
- ☆10Jan 24, 2021Updated 5 years ago
- ☆10Dec 12, 2022Updated 3 years ago
- Implementation of a fast semantic chunker in C++, installable in python 3.7+ projects.☆22Sep 20, 2025Updated 5 months ago
- Dataset for Coherent Topic Segmentation and Classification☆37Jan 31, 2020Updated 6 years ago
- Efficient teacher-student models and scripts to make them☆54Dec 16, 2023Updated 2 years ago
- Code repo for "SketchODE: Learning neural sketch representation in continuous time" published in ICLR 2022☆11Apr 19, 2022Updated 3 years ago
- Piper based VoiceDock TTS implementation☆11Aug 12, 2023Updated 2 years ago
- ☆10Mar 11, 2024Updated last year
- A local, voice-controlled AI assistant with the personality of HAL 9000 from 2001: A Space Odyssey.☆22Aug 16, 2025Updated 6 months ago
- repository for questions that are asked (or you want answered!) during storytelling sessions☆12Sep 7, 2025Updated 6 months ago
- Library for experimenting with state-of-the-art evaluation metrics like UScore☆12May 27, 2023Updated 2 years ago