facebookresearch / belebele
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
☆314Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for belebele
- Manage scalable open LLM inference endpoints in Slurm clusters☆237Updated 3 months ago
- ☆445Updated last week
- The official evaluation suite and dynamic data release for MixEval.☆222Updated last week
- Let's build better datasets, together!☆202Updated 3 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆184Updated last month
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆167Updated 3 months ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆211Updated last year
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆196Updated 5 months ago
- Easily embed, cluster and semantically label text datasets☆459Updated 7 months ago
- An Open Source Toolkit For LLM Distillation☆350Updated last month
- Code for fine-tuning Platypus fam LLMs using LoRA☆623Updated 9 months ago
- Scaling Data-Constrained Language Models☆321Updated last month
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated 3 weeks ago
- OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA☆301Updated last year
- Build, evaluate, understand, and fix LLM-based apps☆484Updated 9 months ago
- Official repository for ORPO☆420Updated 5 months ago
- DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI☆477Updated 6 months ago
- ☆332Updated 11 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆242Updated last week
- An open collection of implementation tips, tricks and resources for training large language models☆459Updated last year
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆131Updated 10 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆219Updated last week
- batched loras☆336Updated last year
- Reverse Instructions to generate instruction tuning data with corpus examples☆204Updated 8 months ago
- Automatically evaluate your LLMs in Google Colab☆556Updated 6 months ago
- A bagel, with everything.☆312Updated 6 months ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆308Updated 5 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆292Updated 10 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆175Updated 3 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆667Updated 7 months ago