bigscience-workshop / multilingual-modelingLinks

BLOOM+1: Adapting BLOOM model to support a new unseen language

☆73

Alternatives and similar repositories for multilingual-modeling

Users that are interested in multilingual-modeling are comparing it to the libraries listed below

Sorting:

cisnlp / Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
☆103Updated last year
google-research / mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
☆111Updated 4 months ago
cindyxinyiwang / expand-via-lexicon-based-adaptation
Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
☆30Updated 3 years ago
machelreid / m2d2
M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer
☆54Updated 2 years ago
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆119Updated 2 years ago
mbzuai-nlp / bactrian-x
A Multilingual Replicable Instruction-Following Model
☆94Updated 2 years ago
MicrosoftTranslator / NTREX
NTREX -- News Test References for MT Evaluation
☆84Updated last year
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago
martiansideofthemoon / rankgen
Official code and model checkpoints for our EMNLP 2022 paper "RankGen - Improving Text Generation with Large Ranking Models" (https://arx…
☆136Updated 2 years ago
lgessler / microbert
A tiny BERT for low-resource monolingual models
☆31Updated 10 months ago
juletx / self-translate
Do Multilingual Language Models Think Better in English?
☆42Updated 2 years ago
google-research / t5x_retrieval
☆100Updated 2 years ago
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
CPJKU / wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
☆82Updated 10 months ago
adapter-hub / hgiyt
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"
☆27Updated 3 years ago
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆58Updated last year
ahmetustun / udapter
UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…
☆31Updated 2 years ago
ZurichNLP / nmtscore
A library of translation-based text similarity measures
☆25Updated last year
cambridgeltl / composable-sft
A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
☆74Updated 11 months ago
martiansideofthemoon / longeval-summarization
Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…
☆44Updated 11 months ago
jungokasai / beam_with_patience
☆46Updated 3 years ago
google-research / xtreme-up
☆51Updated 2 years ago
huggingface / that_is_good_data
☆66Updated last year
alirezamshi-zz / small100
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…
☆23Updated 2 years ago
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆58Updated 2 years ago
inspired-cognition / critique-apps
Apps built using Inspired Cognition's Critique.
☆58Updated 2 years ago
thevasudevgupta / transformers-adapters
This repositary hosts my experiments for the project, I did with OffNote Labs.
☆10Updated 4 years ago
deep-spin / qaware-decode
A repository for experiments in quality-aware decoding
☆17Updated 3 years ago
Rojak-NLP / LLM-Code-Mixing
Can LLMs generate code-mixed sentences through zero-shot prompting?
☆11Updated 2 years ago
nyu-mll / SQuALITY
Query-focused summarization data
☆42Updated 2 years ago