Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
β339Dec 18, 2024Updated last year
Alternatives and similar repositories for belebele
Users that are interested in belebele are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] πΈ GlotCC Dataset and Piplineβ20Apr 6, 2025Updated last year
- β19Apr 26, 2026Updated last week
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackβ96Aug 18, 2023Updated 2 years ago
- COMET for African languagesβ11Jan 24, 2025Updated last year
- β272Aug 1, 2025Updated 9 months ago
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Python intefrace for evaluation on chatgpt modelsβ19Feb 13, 2024Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmarkβ132Aug 21, 2024Updated last year
- [ACL 2023] Glot500: Scaling Multilingual Corpora and Language Models to 500 Languagesβ106Apr 14, 2026Updated 3 weeks ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β3,033Apr 20, 2026Updated 2 weeks ago
- Facebook Low Resource (FLoRes) MT Benchmarkβ767Nov 20, 2023Updated 2 years ago
- Shami Dialect Corpus (SDC)β29Feb 13, 2018Updated 8 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationβ14Aug 19, 2025Updated 8 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"β30Apr 2, 2022Updated 4 years ago
- Robust recipes to align language models with human and AI preferencesβ5,593Apr 8, 2026Updated 3 weeks ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward expβ¦β226Sep 18, 2025Updated 7 months ago
- β1,271Jul 30, 2024Updated last year
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β29Apr 17, 2024Updated 2 years ago
- Code for fine-tuning Platypus fam LLMs using LoRAβ628Feb 4, 2024Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Aug 15, 2023Updated 2 years ago
- #μΈκΆμ½νΌμ€β31Oct 6, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Aug 25, 2023Updated 2 years ago
- β19Apr 21, 2026Updated 2 weeks ago
- β10Oct 2, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β49Nov 13, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Sep 19, 2025Updated 7 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,225Jul 11, 2024Updated last year
- Web UI & Backend for Data Annotations in Ayaβ30Mar 16, 2024Updated 2 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alβ¦β112Sep 10, 2023Updated 2 years ago
- Microsoft Automatic Mixed Precision Libraryβ636Dec 1, 2025Updated 5 months ago
- This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchβ¦β602Nov 17, 2023Updated 2 years ago
- Data and tools for generating and inspecting OLMo pre-training data.β1,492Nov 5, 2025Updated 6 months ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsβ17Jun 28, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Efficient few-shot learning with Sentence Transformersβ2,724Apr 17, 2026Updated 2 weeks ago
- [EMNLP 2023] π¬ Language Identification with Support for More Than 2000 Labelsβ200Apr 15, 2026Updated 3 weeks ago
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.β26Dec 9, 2024Updated last year
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyondβ13Aug 15, 2022Updated 3 years ago
- Crosslingual Question Answering for African Languagesβ31Sep 27, 2024Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ62Aug 30, 2024Updated last year
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]β21May 2, 2024Updated 2 years ago