browsermt / marian-dev
Fast Neural Machine Translation in C++ - development repository
☆19Updated 10 months ago
Alternatives and similar repositories for marian-dev:
Users that are interested in marian-dev are comparing it to the libraries listed below
- Efficient teacher-student models and scripts to make them☆50Updated last year
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆13Updated 3 months ago
- Translation demonstrator☆32Updated 4 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆33Updated 11 months ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆58Updated 6 months ago
- Extracts plain text, language identification and more metadata from WARC records☆21Updated last week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆18Updated 2 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Bilingual sentence similarity classifier using Tensorflow☆20Updated 5 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆48Updated 2 months ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆25Updated 4 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆50Updated last month
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- Toolkit for training/converting LibreTranslate compatible language models 🚂☆51Updated 4 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆54Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Efficiently computing & storing token n-grams from large corpora☆19Updated 5 months ago
- Robust Cross-lingual Embeddings from Parallel Sentences☆22Updated 4 years ago
- Source code for the Apple reproduction☆32Updated 3 years ago
- Seed Machine Translation Data☆30Updated 4 months ago
- Tools to evaluate accuracies of various (research papers') metadata extraction libraries☆11Updated 9 years ago
- website for MS Marco☆28Updated last week
- A text similarity computation using minhashing and Jaccard distance on reuters dataset☆16Updated 6 years ago
- Thot toolkit for statistical machine translation☆53Updated 2 years ago
- Indri search implementation on top of Lucene search engine☆34Updated last year
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago