google/BIG-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google/BIG-bench)

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

☆3,250

Alternatives and similar repositories for BIG-bench

Users that are interested in BIG-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,443Jul 13, 2026Updated 2 weeks ago
stanford-crfm / helm
View on GitHub
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …
☆2,865Jul 1, 2026Updated 3 weeks ago
bigscience-workshop / promptsource
View on GitHub
Toolkit for creating, sharing and using natural language prompts.
☆3,028Oct 23, 2023Updated 2 years ago
hendrycks / test
View on GitHub
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,603May 28, 2023Updated 3 years ago
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,752Jan 8, 2024Updated 2 years ago
suzgunmirac / BIG-Bench-Hard
View on GitHub
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆566Jun 25, 2024Updated 2 years ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,776Aug 4, 2024Updated last year
ruixiangcui / AGIEval
View on GitHub
☆774Jun 13, 2024Updated 2 years ago
EleutherAI / pythia
View on GitHub
The hub for EleutherAI's work on interpretability and learning dynamics
☆2,865Nov 15, 2025Updated 8 months ago
google-research / FLAN
View on GitHub
☆1,566Jul 2, 2026Updated 3 weeks ago
openai / evals
View on GitHub
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
☆19,035Apr 14, 2026Updated 3 months ago
facebookresearch / metaseq
View on GitHub
Repo for external large-scale work
☆6,551Apr 27, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
bigscience-workshop / t-zero
View on GitHub
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
☆463Nov 5, 2022Updated 3 years ago
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,152Jun 1, 2023Updated 3 years ago
google-research / t5x
View on GitHub
☆2,977Jul 9, 2026Updated 2 weeks ago
allenai / RL4LMs
View on GitHub
A modular RL library to fine-tune language models to human preferences
☆2,393Mar 1, 2024Updated 2 years ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,953Updated this week
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,853Jun 17, 2025Updated last year
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,250Sep 30, 2025Updated 9 months ago
openai / human-eval
View on GitHub
Code for the paper "Evaluating Large Language Models Trained on Code"
☆3,324Jan 17, 2025Updated last year
google / seqio
View on GitHub
Task-based datasets, preprocessing, and evaluation for sequence models.
☆594Jul 2, 2026Updated 3 weeks ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,811Updated this week
tatsu-lab / stanford_alpaca
View on GitHub
Code and documentation to train Stanford's Alpaca models, and generate the data.
☆30,246Jul 17, 2024Updated 2 years ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,460Updated this week
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,172Jan 23, 2026Updated 6 months ago
google-research / text-to-text-transfer-transformer
View on GitHub
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,540Jul 8, 2026Updated 3 weeks ago
thunlp / PromptPapers
View on GitHub
Must-read papers on prompt-based tuning for pre-trained language models.
☆4,324Jul 17, 2023Updated 3 years ago
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,247Updated this week
THUDM / AgentBench
View on GitHub
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆3,611Feb 8, 2026Updated 5 months ago
EleutherAI / gpt-neox
View on GitHub
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
☆7,448Jun 11, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,651May 26, 2026Updated 2 months ago
facebookresearch / KILT
View on GitHub
Library for Knowledge Intensive Language Tasks
☆979Mar 31, 2022Updated 4 years ago
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,607Mar 27, 2023Updated 3 years ago
facebookresearch / LAMA
View on GitHub
LAnguage Model Analysis
☆1,391Jul 7, 2024Updated 2 years ago
deepspeedai / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆42,827Updated this week
lm-sys / FastChat
View on GitHub
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
☆39,505May 1, 2026Updated 2 months ago
thunlp / OpenPrompt
View on GitHub
An Open-Source Framework for Prompt-Learning.
☆4,886Jul 16, 2024Updated 2 years ago