Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
β338Dec 18, 2024Updated last year
Alternatives and similar repositories for belebele
Users that are interested in belebele are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- πΈ GlotCC Dataset and Pipline -- NeurIPS 2024β20Apr 6, 2025Updated last year
- β19Sep 16, 2025Updated 6 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackβ96Aug 18, 2023Updated 2 years ago
- COMET for African languagesβ11Jan 24, 2025Updated last year
- β270Aug 1, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- We introduce MKQA, an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically β¦β193Jun 16, 2022Updated 3 years ago
- Python intefrace for evaluation on chatgpt modelsβ19Feb 13, 2024Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmarkβ132Aug 21, 2024Updated last year
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023β106Apr 20, 2024Updated last year
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,983Updated this week
- Facebook Low Resource (FLoRes) MT Benchmarkβ766Nov 20, 2023Updated 2 years ago
- Shami Dialect Corpus (SDC)β29Feb 13, 2018Updated 8 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationβ14Aug 19, 2025Updated 7 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"β30Apr 2, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Robust recipes to align language models with human and AI preferencesβ5,558Updated this week
- β1,266Jul 30, 2024Updated last year
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward expβ¦β226Sep 18, 2025Updated 6 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β29Apr 17, 2024Updated last year
- Code for fine-tuning Platypus fam LLMs using LoRAβ628Feb 4, 2024Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Aug 15, 2023Updated 2 years ago
- #μΈκΆμ½νΌμ€β31Oct 6, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Aug 25, 2023Updated 2 years ago
- β19May 23, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- β10Oct 2, 2024Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β49Nov 13, 2023Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Sep 19, 2025Updated 6 months ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksβ7,208Jul 11, 2024Updated last year
- Web UI & Backend for Data Annotations in Ayaβ30Mar 16, 2024Updated 2 years ago
- Data and tools for generating and inspecting OLMo pre-training data.β1,476Nov 5, 2025Updated 5 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alβ¦β111Sep 10, 2023Updated 2 years ago
- Microsoft Automatic Mixed Precision Libraryβ636Dec 1, 2025Updated 4 months ago
- This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchβ¦β601Nov 17, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsβ17Jun 28, 2025Updated 9 months ago
- Efficient few-shot learning with Sentence Transformersβ2,710Apr 2, 2026Updated last week
- This is the official repository for Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks.β26Dec 9, 2024Updated last year
- π¬ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023β197Mar 27, 2026Updated 2 weeks ago
- List of all the resources I developed in collaboration with LSV and Masakhane during my doctoral studies and beyondβ13Aug 15, 2022Updated 3 years ago
- Crosslingual Question Answering for African Languagesβ31Sep 27, 2024Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ62Aug 30, 2024Updated last year