microsoft / Lightweight-Low-Resource-NMTLinks

Official code for "Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models" to appear in WMT 2022.

☆17

Alternatives and similar repositories for Lightweight-Low-Resource-NMT

Users that are interested in Lightweight-Low-Resource-NMT are comparing it to the libraries listed below

Sorting:

microsoft / UICaption
We release the UICaption dataset. The dataset consists of UI images (icons and screenshots) and associated text descriptions. This datase…
☆41Updated 2 years ago
microsoft / PLOG
☆22Updated 2 years ago
huggingface / olm-datasets
Pipeline for pulling and processing online language model pretraining data from the web
☆177Updated 2 years ago
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago
EleutherAI / semantic-memorization
☆44Updated 11 months ago
microsoft / CodeRanker
Fault-aware neural code rankers
☆29Updated 2 years ago
xplip / pixel
Research code for pixel-based encoders of language (PIXEL)
☆339Updated 3 months ago
babylm / evaluation-pipeline-2023
Evaluation pipeline for the BabyLM Challenge 2023.
☆76Updated 2 years ago
allenai / bff
☆39Updated last year
microsoft / gpt-MT
☆84Updated 2 years ago
huggingface / that_is_good_data
☆65Updated 2 years ago
SeanNaren / min-LLM
Minimal code to train a Large Language Model (LLM).
☆172Updated 3 years ago
applicaai / CCpdf
Index of URLs to pdf files all over the internet and scripts
☆24Updated 2 years ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
google-research-datasets / Hinglish-TOP-Dataset
Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…
☆41Updated 2 years ago
google-research / t5x_retrieval
☆101Updated 2 years ago
yandex-research / DeDLOC
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)
☆117Updated 3 years ago
facebookresearch / EditEval
An instruction-based benchmark for text improvements.
☆143Updated 2 years ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated 2 years ago
LAION-AI / Big-Interleaved-Dataset
Big-Interleaved-Dataset
☆57Updated 2 years ago
martiansideofthemoon / hurdles-longform-qa
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://a…
☆46Updated 3 years ago
facebookresearch / stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆283Updated this week
bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆74Updated last year
google-research / url-nlp
☆220Updated 2 months ago
SeanNaren / minGPT
A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!
☆113Updated 2 years ago
martiansideofthemoon / rankgen
Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…
☆138Updated 2 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆96Updated 2 years ago
EleutherAI / pilev2
☆13Updated 2 years ago
allenai / c4-documentation
☆32Updated 4 years ago
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year