dadelani / sib-200Links

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

☆21

Alternatives and similar repositories for sib-200

Users that are interested in sib-200 are comparing it to the libraries listed below

Sorting:

bigscience-workshop / multilingual-modeling
BLOOM+1: Adapting BLOOM model to support a new unseen language
☆72Updated last year
cisnlp / Glot500
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023
☆103Updated last year
ZurichNLP / ContraDecode
The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…
☆35Updated last year
mbzuai-nlp / bactrian-x
A Multilingual Replicable Instruction-Following Model
☆94Updated 2 years ago
ZurichNLP / multilingual-instruction-tuning
Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"
☆25Updated last month
EleanorJiang / BlonDe
Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …
☆77Updated last year
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆133Updated 6 months ago
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆58Updated last year
lukemelas / mtob
☆36Updated last year
nlp-uoregon / Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
☆97Updated last year
huggingface / that_is_good_data
☆66Updated last year
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆119Updated 2 years ago
malteos / llm-datasets
A collection of datasets for language model pretraining including scripts for downloading, preprocesssing, and sampling.
☆59Updated 11 months ago
juletx / self-translate
Do Multilingual Language Models Think Better in English?
☆42Updated last year
google-research / url-nlp
☆215Updated 2 weeks ago
malteos / clp-transfer
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
☆30Updated 2 years ago
MicrosoftTranslator / GEMBA
GEMBA — GPT Estimation Metric Based Assessment
☆119Updated 11 months ago
shayne-longpre / a-pretrainers-guide
☆72Updated 2 years ago
microsoft / Multilingual-Evaluation-of-Generative-AI-MEGA
Code for Multilingual Eval of Generative AI paper published at EMNLP 2023
☆70Updated last year
ahmetustun / hyperx
☆20Updated 2 years ago
bigscience-workshop / data_tooling
Tools for managing datasets for governance and training.
☆85Updated last month
google-research / mt-metrics-eval
Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.
☆110Updated 4 months ago
ffaltings / InteractiveTextGeneration
☆34Updated 2 years ago
cambridgeltl / composable-sft
A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.
☆74Updated 11 months ago
CPJKU / wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
☆82Updated 10 months ago
gsarti / pecore
Materials for "Quantifying the Plausibility of Context Reliance in Neural Machine Translation" at ICLR'24 🐑 🐑
☆15Updated last year
NJUNLP / MMT-LLM
☆34Updated 2 years ago
liuzeming01 / XDailyDialog
https://liuzeming01.github.io/XDailyDialog/
☆10Updated 2 years ago
yxuansu / Contrastive_Search_versus_Contrastive_Decoding
An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation
☆27Updated last year
kaiyuhwang / MLLM-Survey
The paper list of multilingual pre-trained models (Continual Updated).
☆22Updated last year