VProv / BPE-DropoutLinks

An official implementation of "BPE-Dropout: Simple and Effective Subword Regularization" algorithm.

☆53

Alternatives and similar repositories for BPE-Dropout

Users that are interested in BPE-Dropout are comparing it to the libraries listed below

Sorting:

EdinburghNLP / opus-100-corpus
☆94Updated last year
deep-spin / qaware-decode
A repository for experiments in quality-aware decoding
☆17Updated 3 years ago
rycolab / uid-decoding
☆41Updated 4 years ago
ghrua / NgramRes
☆21Updated 2 years ago
zcgzcgzcg1 / MediaSum
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
☆75Updated 4 years ago
timoschick / dino
This repository contains the code for "Generating Datasets with Pretrained Language Models".
☆188Updated 3 years ago
bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆73Updated last year
cisnlp / Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
☆103Updated last year
cindyxinyiwang / expand-via-lexicon-based-adaptation
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆30Updated 3 years ago
jacklxc / StandAloneSpellingCorrection
Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"
☆18Updated 4 years ago
cindyxinyiwang / multiview-subword-regularization
PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"
☆25Updated 4 years ago
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆119Updated 2 years ago
rbawden / DiaBLa-dataset
English-French MT dialogue dataset
☆17Updated 3 years ago
ricsinaruto / gutenberg-dialog
Build a dialog dataset from online books in many languages
☆76Updated 2 years ago
google-research / mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
☆111Updated 4 months ago
machelreid / m2d2
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Updated 2 years ago
tanyuqian / ctc-gen-eval
EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation
☆97Updated 2 years ago
microsoft / DialogLM
Official Implementation of "DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarization."
☆141Updated 2 years ago
thevasudevgupta / transformers-adapters
This repositary hosts my experiments for the project, I did with OffNote Labs.
☆10Updated 4 years ago
facebookresearch / mlqe
We release a dataset based on Wikipedia sentences and the corresponding translations in 6 different languages along with the scores (scal…
☆81Updated 3 years ago
grammarly / GMEG
GMEG
☆29Updated 8 months ago
thompsonb / prism
MT Evaluation in Many Languages via Zero-Shot Paraphrasing
☆101Updated last year
roeeaharoni / unsupervised-domain-clusters
Code and data accompanying our ACL 2020 paper, "Unsupervised Domain Clusters in Pretrained Language Models".
☆58Updated 4 years ago
facebookresearch / asset
A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
☆56Updated 2 years ago
shijie-wu / crosslingual-nlp
This repo supports various cross-lingual transfer learning & multilingual NLP models.
☆92Updated last year
urvashik / knnmt
☆45Updated 4 years ago
ZurichNLP / coverage-contrastive-conditioning
Data and code accompanying the paper "As Little as Possible, as Much as Necessary: Detecting Over- and Undertranslations with Contrastive…
☆22Updated 2 years ago
AkariAsai / CORA
This is the official implementation of NeurIPS 2021 "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Ret…
☆71Updated 3 years ago
shijie-wu / neural-transducer
This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.
☆76Updated last year
xlhex / dpe
☆22Updated 4 years ago