Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
โ339Dec 18, 2024Updated last year
Alternatives and similar repositories for belebele
Users that are interested in belebele are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] ๐ธ GlotCC Dataset and Piplineโ20Apr 6, 2025Updated last year
- โ19Apr 26, 2026Updated last month
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackโ96Aug 18, 2023Updated 2 years ago
- COMET for African languagesโ11Jan 24, 2025Updated last year
- โ272Aug 1, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways โข AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically โฆโ194Jun 16, 2022Updated 3 years ago
- Multilingual Large Language Models Evaluation Benchmarkโ133Aug 21, 2024Updated last year
- Python intefrace for evaluation on chatgpt modelsโ19Feb 13, 2024Updated 2 years ago
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesโ106Apr 14, 2026Updated last month
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.โ3,066May 6, 2026Updated 2 weeks ago
- Facebook Low Resource (FLoRes) MT Benchmarkโ768Nov 20, 2023Updated 2 years ago
- Shami Dialect Corpus (SDC)โ29Feb 13, 2018Updated 8 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationโ14Aug 19, 2025Updated 9 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"โ30Apr 2, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Robust recipes to align language models with human and AI preferencesโ5,605Apr 8, 2026Updated last month
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward expโฆโ226Sep 18, 2025Updated 8 months ago
- โ1,271Jul 30, 2024Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/โฆโ29Apr 17, 2024Updated 2 years ago
- Code for fine-tuning Platypus fam LLMs using LoRAโ626Feb 4, 2024Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pโฆโ34Aug 15, 2023Updated 2 years ago
- #์ธ๊ถ์ฝํผ์คโ31Oct 6, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heโฆโ31Aug 25, 2023Updated 2 years ago
- โ19Apr 21, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI โข AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- โ10Oct 2, 2024Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laโฆโ49Nov 13, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.โ32Sep 19, 2025Updated 8 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksโ7,229Jul 11, 2024Updated last year
- Web UI & Backend for Data Annotations in Ayaโ30Mar 16, 2024Updated 2 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alโฆโ112Sep 10, 2023Updated 2 years ago
- Microsoft Automatic Mixed Precision Libraryโ636Dec 1, 2025Updated 5 months ago
- This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchโฆโ604Nov 17, 2023Updated 2 years ago
- Data and tools for generating and inspecting OLMo pre-training data.โ1,499Nov 5, 2025Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsโ17Jun 28, 2025Updated 10 months ago
- ๐ฎ LLM GPU Calculatorโ21Aug 19, 2023Updated 2 years ago
- Efficient few-shot learning with Sentence Transformersโ2,741Apr 17, 2026Updated last month
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.โ26Dec 9, 2024Updated last year
- [EMNLP 2023] ๐ฌ Language Identification with Support for More Than 2000 Labelsโ204Apr 15, 2026Updated last month
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyondโ13Aug 15, 2022Updated 3 years ago
- Crosslingual Question Answering for African Languagesโ31Sep 27, 2024Updated last year