Helsinki-NLP / OPUS-MT-train
Training open neural machine translation models
☆351Updated 6 months ago
Alternatives and similar repositories for OPUS-MT-train:
Users that are interested in OPUS-MT-train are comparing it to the libraries listed below
- Open neural machine translation models and web services☆655Updated 2 months ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆267Updated last month
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆154Updated 8 months ago
- A neural word aligner based on multilingual BERT☆338Updated 2 years ago
- ☆57Updated 2 years ago
- The pipeline for the OSCAR corpus☆166Updated last year
- BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages☆221Updated last year
- Fast Neural Machine Translation in C++☆1,286Updated last year
- Bitextor generates translation memories from multilingual websites☆293Updated 3 months ago
- OpusFilter - Parallel corpus processing toolkit☆104Updated 3 weeks ago
- Improved Sentence Alignment in Linear Time and Space☆165Updated last year
- Easy to use, state-of-the-art Neural Machine Translation for 100+ languages☆1,210Updated last year
- The FLORES+ Machine Translation Benchmark☆100Updated 3 months ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆357Updated last year
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆572Updated last year
- Library for translating between 200 languages. Built on 🤗 transformers.☆468Updated 5 months ago
- Multilingual sentence alignment using sentence embeddings☆108Updated 3 months ago
- 80x faster and 95% accurate language identification with Fasttext☆146Updated last year
- Crosslingual Generalization through Multitask Finetuning☆525Updated 5 months ago
- A Neural Framework for MT Evaluation☆542Updated last month
- CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed)☆362Updated 3 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆489Updated 8 months ago
- Neural Machine Translation (NMT) tutorial. Data preprocessing, model training, evaluation, and deployment.☆157Updated 10 months ago
- A tool that locates, downloads, and extracts machine translation corpora☆150Updated 8 months ago
- Seed Machine Translation Data☆30Updated 3 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆238Updated 2 years ago
- Open information and community for machine translation☆73Updated last week
- Neural end-to-end Speech Translation Toolkit☆301Updated 2 years ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆297Updated 5 months ago
- State-of-the-art LLM-based translation models.☆486Updated 3 weeks ago