Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
โ339Dec 18, 2024Updated last year
Alternatives and similar repositories for belebele
Users that are interested in belebele are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ๐ธ GlotCC Dataset and Pipline -- NeurIPS 2024โ20Apr 6, 2025Updated 11 months ago
- โ19Sep 16, 2025Updated 6 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackโ96Aug 18, 2023Updated 2 years ago
- COMET for African languagesโ11Jan 24, 2025Updated last year
- โ268Aug 1, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI โข AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically โฆโ192Jun 16, 2022Updated 3 years ago
- Python intefrace for evaluation on chatgpt modelsโ19Feb 13, 2024Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmarkโ132Aug 21, 2024Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023โ106Apr 20, 2024Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.โ2,965Mar 16, 2026Updated last week
- Facebook Low Resource (FLoRes) MT Benchmarkโ766Nov 20, 2023Updated 2 years ago
- Shami Dialect Corpus (SDC)โ29Feb 13, 2018Updated 8 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationโ14Aug 19, 2025Updated 7 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"โ30Apr 2, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean โข AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Robust recipes to align language models with human and AI preferencesโ5,535Sep 8, 2025Updated 6 months ago
- โ1,262Jul 30, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward expโฆโ226Sep 18, 2025Updated 6 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/โฆโ28Apr 17, 2024Updated last year
- Code for fine-tuning Platypus fam LLMs using LoRAโ629Feb 4, 2024Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pโฆโ35Aug 15, 2023Updated 2 years ago
- #์ธ๊ถ์ฝํผ์คโ31Oct 6, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heโฆโ31Aug 25, 2023Updated 2 years ago
- โ19May 23, 2024Updated last year
- NordVPN Special Discount Offer โข AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- โ10Oct 2, 2024Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laโฆโ49Nov 13, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.โ32Sep 19, 2025Updated 6 months ago
- Data and tools for generating and inspecting OLMo pre-training data.โ1,460Nov 5, 2025Updated 4 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksโ7,201Jul 11, 2024Updated last year
- Web UI & Backend for Data Annotations in Ayaโ30Mar 16, 2024Updated 2 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alโฆโ111Sep 10, 2023Updated 2 years ago
- Microsoft Automatic Mixed Precision Libraryโ635Dec 1, 2025Updated 3 months ago
- This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchโฆโ600Nov 17, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ๐ฎ LLM GPU Calculatorโ21Aug 19, 2023Updated 2 years ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsโ17Jun 28, 2025Updated 8 months ago
- Efficient few-shot learning with Sentence Transformersโ2,699Dec 11, 2025Updated 3 months ago
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.โ26Dec 9, 2024Updated last year
- ๐ฌ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023โ191Mar 12, 2026Updated last week
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyondโ13Aug 15, 2022Updated 3 years ago
- Crosslingual Question Answering for African Languagesโ31Sep 27, 2024Updated last year