glample/fastBPE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/glample/fastBPE)

glample / fastBPE

Fast BPE

☆677

Alternatives and similar repositories for fastBPE

Users that are interested in fastBPE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rsennrich / subword-nmt
View on GitHub
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
☆2,271Aug 7, 2024Updated last year
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,927Feb 14, 2023Updated 3 years ago
hplt-project / sacremoses
View on GitHub
Python port of Moses tokenizer, truecaser and normalizer
☆497Feb 6, 2026Updated 5 months ago
clab / fast_align
View on GitHub
Simple, fast unsupervised word aligner
☆769Jul 19, 2022Updated 4 years ago
facebookresearch / UnsupervisedMT
View on GitHub
Phrase-Based & Neural Unsupervised Machine Translation
☆1,499Sep 15, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
VKCOM / YouTokenToMe
View on GitHub
Unsupervised text tokenizer focused on computational efficiency
☆979Mar 29, 2024Updated 2 years ago
moses-smt / mosesdecoder
View on GitHub
Moses, the machine translation system
☆1,625Mar 28, 2025Updated last year
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
google / sentencepiece
View on GitHub
Unsupervised text tokenizer for Neural Network-based text generation.
☆11,969Updated this week
microsoft / MASS
View on GitHub
MASS: Masked Sequence to Sequence Pre-training for Language Generation
☆1,117Nov 28, 2022Updated 3 years ago
mjpost / sacrebleu
View on GitHub
Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
☆1,253Updated this week
facebookresearch / MUSE
View on GitHub
A library for Multilingual Unsupervised or Supervised word Embeddings
☆3,248Aug 31, 2022Updated 3 years ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 9 months ago
marian-nmt / marian
View on GitHub
Fast Neural Machine Translation in C++
☆1,458Aug 25, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bheinzerling / bpemb
View on GitHub
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
☆1,222Oct 1, 2024Updated last year
artetxem / vecmap
View on GitHub
A framework to learn cross-lingual word embedding mappings
☆655Apr 22, 2023Updated 3 years ago
rsennrich / wmt16-scripts
View on GitHub
scripts and configuration files for Edinburgh neural MT submission to WMT 16 shared translation task
☆139Nov 5, 2020Updated 5 years ago
artetxem / monoses
View on GitHub
Unsupervised Statistical Machine Translation
☆232Aug 30, 2020Updated 5 years ago
THUNLP-MT / Document-Transformer
View on GitHub
Improving the Transformer translation model with document-level context
☆170Jul 7, 2020Updated 6 years ago
artetxem / undreamt
View on GitHub
Unsupervised Neural Machine Translation
☆474Jul 8, 2020Updated 6 years ago
THUNLP-MT / MT-Reading-List
View on GitHub
A machine translation reading list maintained by Tsinghua Natural Language Processing Group
☆2,437Aug 9, 2024Updated last year
OpenNMT / OpenNMT-py
View on GitHub
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
☆7,007Oct 14, 2025Updated 9 months ago
facebookresearch / unlikelihood_training
View on GitHub
Neural Text Generation with Unlikelihood Training
☆311Aug 31, 2021Updated 4 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
neulab / word-embeddings-for-nmt
View on GitHub
Supplementary material for "When and Why Are Pre-trained Word Embeddings Useful for Neural Machine Translation?" at NAACL 2018
☆123Sep 22, 2025Updated 9 months ago
salesforce / awd-lstm-lm
View on GitHub
LSTM and QRNN Language Model Toolkit for PyTorch
☆1,990Feb 12, 2022Updated 4 years ago
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,244Sep 30, 2025Updated 9 months ago
EdinburghNLP / nematus
View on GitHub
Open-Source Neural Machine Translation in Tensorflow
☆805Dec 9, 2022Updated 3 years ago
zihangdai / xlnet
View on GitHub
XLNet: Generalized Autoregressive Pretraining for Language Understanding
☆6,180May 28, 2023Updated 3 years ago
salesforce / ctrl
View on GitHub
Conditional Transformer Language Model for Controllable Generation
☆1,881May 1, 2025Updated last year
nyu-mll / jiant
View on GitHub
jiant is an nlp toolkit
☆1,675Jul 6, 2023Updated 3 years ago
harvardnlp / pytorch-struct
View on GitHub
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,132Apr 20, 2022Updated 4 years ago
M4t1ss / SoftAlignments
View on GitHub
Neural macine translation soft alignment visualisations for web and command line
☆73Aug 19, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
shrimai / Towards-Content-Transfer-through-Grounded-Text-Generation
View on GitHub
☆33May 15, 2019Updated 7 years ago
allenai / allennlp
View on GitHub
An open-source NLP research library, built on PyTorch.
☆11,889Nov 22, 2022Updated 3 years ago
aizhanti / JaRuNC
View on GitHub
Japanese--Russian--English News Commentary Parallel Data
☆18Jul 9, 2019Updated 7 years ago
WikiExtractor / wikiextractor
View on GitHub
A tool for extracting plain text from Wikipedia dumps
☆3,997Updated this week
pytorch / text
View on GitHub
Models, data loaders and abstractions for language processing, powered by PyTorch
☆3,559Sep 10, 2025Updated 10 months ago
facebookresearch / evaluation-of-nmt-bt
View on GitHub
This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …
☆15Aug 31, 2021Updated 4 years ago
bzhangGo / zero
View on GitHub
Zero -- A neural machine translation system
☆152May 8, 2023Updated 3 years ago