facebookresearch / belebele
Repo for the Belebele dataset, a massively multilingual reading comprehension dataset.
☆317Updated last month
Alternatives and similar repositories for belebele:
Users that are interested in belebele are comparing it to the libraries listed below
- Manage scalable open LLM inference endpoints in Slurm clusters☆249Updated 6 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆225Updated 2 months ago
- Build, evaluate, understand, and fix LLM-based apps☆485Updated last year
- ☆489Updated 2 months ago
- A bagel, with everything.☆315Updated 9 months ago
- Let's build better datasets, together!☆250Updated last month
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆66Updated 3 months ago
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆184Updated last year
- An open collection of implementation tips, tricks and resources for training large language models☆468Updated last year
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆92Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆687Updated 9 months ago
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆312Updated last month
- Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆111Updated 2 months ago
- Automatically evaluate your LLMs in Google Colab☆584Updated 8 months ago
- Pipeline for pulling and processing online language model pretraining data from the web☆175Updated last year
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆182Updated 3 months ago
- data cleaning and curation for unstructured text☆329Updated 5 months ago
- Website for hosting the Open Foundation Models Cheat Sheet.☆263Updated 7 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆296Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆218Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆163Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆184Updated 5 months ago
- batched loras☆338Updated last year
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆67Updated 10 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆208Updated last year
- DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI☆488Updated this week
- Domain Adapted Language Modeling Toolkit - E2E RAG☆313Updated 2 months ago
- experiments with inference on llama☆104Updated 7 months ago
- A large-scale multilingual dataset for Information Retrieval. Thorough human-annotations across 18 diverse languages.☆173Updated 5 months ago
- This project is an attempt to create a common metric to test LLM's for progress in eliminating hallucinations which is the most serious c…☆221Updated last year