browsermt / marian-dev
Fast Neural Machine Translation in C++ - development repository
☆19Updated 8 months ago
Alternatives and similar repositories for marian-dev:
Users that are interested in marian-dev are comparing it to the libraries listed below
- Efficient teacher-student models and scripts to make them☆49Updated last year
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆47Updated last month
- Efficiently computing & storing token n-grams from large corpora☆17Updated 3 months ago
- Faster, modernized fork of the language identification tool langid.py☆50Updated 2 months ago
- Thot toolkit for statistical machine translation☆50Updated 2 years ago
- A sentence segmentation library with wide language support optimized for speed and utility.☆55Updated 4 months ago
- A web interface to understand language-specific BERT-models☆17Updated 9 months ago
- Highly specialized crate to parse and use `google/sentencepiece` 's precompiled_charsmap in `tokenizers`☆18Updated 2 years ago
- Translation demonstrator☆29Updated 4 years ago
- SALM: Suffix Array and its Applications in Empirical Language Processing by Joy☆11Updated 7 years ago
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆13Updated 2 months ago
- Corpus preprocessing☆95Updated 10 months ago
- Fast approximate strings search & spelling correction☆57Updated 3 years ago
- Transform TMX to text☆28Updated 2 years ago
- Tools to evaluate accuracies of various (research papers') metadata extraction libraries☆11Updated 9 years ago
- Fast stand-alone C++ decoder for RNN-based NMT models☆25Updated 4 years ago
- LibreOffice Neural Machine Translation☆70Updated 4 years ago
- Indri search implementation on top of Lucene search engine☆34Updated 10 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆30Updated 10 months ago
- A database of number names for 186 languages, locales, and scripts☆65Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆18Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆16Updated 2 years ago
- Bilingual sentence similarity classifier using Tensorflow☆20Updated 5 years ago
- PANiC - PAraphrasing Noun-Compounds☆15Updated 6 years ago
- Automatic extraction of edited sentences from text edition histories.☆82Updated 2 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- bin files☆13Updated 2 months ago
- Experiments with Hugging Face 🔬 🤗☆45Updated 5 months ago
- BERT models for many languages created from Wikipedia texts☆34Updated 4 years ago