browsermt / marian-devLinks
Fast Neural Machine Translation in C++ - development repository
☆19Updated last year
Alternatives and similar repositories for marian-dev
Users that are interested in marian-dev are comparing it to the libraries listed below
Sorting:
- Efficient teacher-student models and scripts to make them☆51Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆52Updated 4 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆35Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆27Updated 11 months ago
- Documentation effort for the BookCorpus dataset☆34Updated 4 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆17Updated 7 months ago
- Experiments with Hugging Face 🔬 🤗☆44Updated 10 months ago
- Visegrad+ Parliament API. Access to parliament data of Visegrad+ countries in a common data standard.☆12Updated 9 years ago
- Fast and robust NLP components implemented in Java.☆52Updated 4 years ago
- Efficiently computing & storing token n-grams from large corpora☆24Updated 9 months ago
- The Open Virtual Assistant☆56Updated 4 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆18Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆56Updated 7 months ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Python bindings for the fast integer compression library FastPFor.☆60Updated last year
- An open, comprehensive catalog of scholarship, connecting papers, authors, institutions, and journals.☆10Updated last year
- A database of number names for 186 languages, locales, and scripts☆67Updated 2 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆63Updated 5 months ago
- NLP command-line assistant powered by OpenAI☆21Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆23Updated 4 months ago
- ☆90Updated 3 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Updated last year
- finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests☆41Updated 2 years ago
- Test prompts for GPT-J-6B and the resulting AI-generated texts☆53Updated 4 years ago
- Indri search implementation on top of Lucene search engine☆34Updated last year
- Extract knowledge from raw text☆13Updated 3 years ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆50Updated 2 months ago