Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
โ340Dec 18, 2024Updated last year
Alternatives and similar repositories for belebele
Users that are interested in belebele are comparing it to the libraries listed below
Sorting:
- โ19Sep 16, 2025Updated 5 months ago
- ๐ธ GlotCC Dataset and Pipline -- NeurIPS 2024โ20Apr 6, 2025Updated 10 months ago
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedbackโ96Aug 18, 2023Updated 2 years ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.โ2,915Updated this week
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward expโฆโ226Sep 18, 2025Updated 5 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is alโฆโ111Sep 10, 2023Updated 2 years ago
- Multilingual Large Language Models Evaluation Benchmarkโ132Aug 21, 2024Updated last year
- COMET for African languagesโ10Jan 24, 2025Updated last year
- โ10Oct 2, 2024Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heโฆโ31Aug 25, 2023Updated 2 years ago
- โ266Aug 1, 2025Updated 7 months ago
- Python intefrace for evaluation on chatgpt modelsโ19Feb 13, 2024Updated 2 years ago
- Code for fine-tuning Platypus fam LLMs using LoRAโ629Feb 4, 2024Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023โ106Apr 20, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinksโ7,187Jul 11, 2024Updated last year
- Data and tools for generating and inspecting OLMo pre-training data.โ1,416Nov 5, 2025Updated 3 months ago
- Robust recipes to align language models with human and AI preferencesโ5,510Sep 8, 2025Updated 5 months ago
- NTREX -- News Test References for MT Evaluationโ88Jun 5, 2024Updated last year
- Seamless Voice Interactions with LLMsโ12Oct 28, 2023Updated 2 years ago
- This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchโฆโ600Nov 17, 2023Updated 2 years ago
- โก Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Plโฆโ2,175Oct 8, 2024Updated last year
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"โ30Apr 2, 2022Updated 3 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.โ32Sep 19, 2025Updated 5 months ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agentsโ555Oct 28, 2023Updated 2 years ago
- ๐ฎ LLM GPU Calculatorโ21Aug 19, 2023Updated 2 years ago
- Microsoft Automatic Mixed Precision Libraryโ636Dec 1, 2025Updated 3 months ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laโฆโ49Nov 13, 2023Updated 2 years ago
- Facebook Low Resource (FLoRes) MT Benchmarkโ766Nov 20, 2023Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pโฆโ35Aug 15, 2023Updated 2 years ago
- Stanford NLP Python library for Representation Finetuning (ReFT)โ1,560Jan 14, 2026Updated last month
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/โฆโ28Apr 17, 2024Updated last year
- ๐ฌ Language Identification with Support for More Than 2000 Labels -- EMNLP 2023โ188Nov 19, 2025Updated 3 months ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuningโ35Aug 9, 2023Updated 2 years ago
- Code for Zero-Shot Tokenizer Transferโ143Jan 14, 2025Updated last year
- LLM as a Chatbot Serviceโ3,332Nov 20, 2023Updated 2 years ago
- Salesforce open-source LLMs with 8k sequence length.โ725Jan 31, 2025Updated last year
- MPI Code Generation through Domain-Specific Language Modelsโ14Nov 19, 2024Updated last year
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationโ14Aug 19, 2025Updated 6 months ago
- ParaNames: A multilingual resource for parallel namesโ39May 20, 2024Updated last year