Helsinki-NLP / OPUS-MT-train
Training open neural machine translation models
☆336Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for OPUS-MT-train
- Open neural machine translation models and web services☆623Updated last month
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆251Updated last month
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆150Updated 5 months ago
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆157Updated last month
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆565Updated last year
- A neural word aligner based on multilingual BERT☆328Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆351Updated last year
- Multilingual sentence alignment using sentence embeddings☆101Updated 2 weeks ago
- Open language modeling toolkit based on PyTorch☆61Updated this week
- ☆487Updated 9 months ago
- Tools to download and cleanup Common Crawl data☆971Updated last year
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆230Updated 2 years ago
- Facebook Low Resource (FLoRes) MT Benchmark☆704Updated last year
- ☆1,252Updated last year
- BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages☆219Updated last year
- Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.☆252Updated 2 years ago
- A small seq2seq punctuator tool based on DistilBERT☆50Updated 2 months ago
- Fast Neural Machine Translation in C++☆1,254Updated last year
- MPNet: Masked and Permuted Pre-training for Language Understanding https://arxiv.org/pdf/2004.09297.pdf☆288Updated 3 years ago
- FastFormers - highly efficient transformer models for NLU☆701Updated 10 months ago
- OpusFilter - Parallel corpus processing toolkit☆102Updated 3 months ago
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,068Updated 3 months ago
- Improved Sentence Alignment in Linear Time and Space☆163Updated last year
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆433Updated 2 years ago
- A tool that locates, downloads, and extracts machine translation corpora☆147Updated 5 months ago
- cLang-8 is a dataset for grammatical error correction.☆103Updated 2 years ago
- ☆56Updated 2 years ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆284Updated 2 months ago
- Python port of Moses tokenizer, truecaser and normalizer☆488Updated 5 months ago