google-deepmind/bbeh

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-deepmind/bbeh)

google-deepmind / bbeh

☆126

Alternatives and similar repositories for bbeh

Users that are interested in bbeh are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Jiahao004 / DeepTheorem
View on GitHub
☆27Jun 10, 2025Updated last year
chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year
formll / resolving-scaling-law-discrepancies
View on GitHub
☆19Nov 4, 2025Updated 8 months ago
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
linhaowei1 / kumo
View on GitHub
☁️ KUMO: Generative Evaluation of Complex Reasoning in Large Language Models
☆20Jun 4, 2025Updated last year
hkust-nlp / llm-compression-intelligence
View on GitHub
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆150Sep 20, 2024Updated last year
THU-KEG / PairJudgeRM
View on GitHub
☆15Apr 14, 2025Updated last year
kevinwu23 / StanfordFineTuneBench
View on GitHub
☆32Nov 14, 2024Updated last year
apple / ml-reversal-blessing
View on GitHub
☆17Jul 31, 2025Updated 11 months ago
Hritikbansal / sparse_feedback
View on GitHub
☆29Jan 23, 2024Updated 2 years ago
arubique / OCCAM
View on GitHub
This is an implementation of the paper "Are We Done with Object-Centric Learning?"
☆13Jun 21, 2026Updated last month
suzgunmirac / BIG-Bench-Hard
View on GitHub
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
☆566Jun 25, 2024Updated 2 years ago
jylei16 / Imagine-e
View on GitHub
☆14Jan 22, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
joykirat18 / How-To-Think-Step-by-Step
View on GitHub
How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning
☆26Aug 29, 2024Updated last year
huggingface / Math-Verify
View on GitHub
☆1,170Jan 10, 2026Updated 6 months ago
Hritikbansal / jpo
View on GitHub
☆13Jul 2, 2025Updated last year
open-thought / reasoning-gym-eval
View on GitHub
Collection of LLM completions for reasoning-gym task datasets
☆31Jul 4, 2025Updated last year
rohan598 / ConTextual
View on GitHub
☆27Jul 20, 2024Updated 2 years ago
UKPLab / acl2024-ircoder
View on GitHub
Data creation, training and eval scripts for the IRCoder paper
☆21May 31, 2024Updated 2 years ago
davisrbr / conjectures-arxiv
View on GitHub
OpenConjecture, a dataset of mathematics conjectures pulled from papers published to the ArXiv
☆15Jul 12, 2026Updated last week
ibivu / protein-glue
View on GitHub
Accompanying code for the ProteinGLUE method
☆13Apr 12, 2022Updated 4 years ago
open-nlplab / fastchatgpt
View on GitHub
A python tool help to interact with chatgpt.
☆10Dec 11, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
CodeCreator / WebOrganizer
View on GitHub
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆83May 2, 2025Updated last year
ConsequentAI / fneval
View on GitHub
Functional Benchmarks and the Reasoning Gap
☆90Oct 1, 2024Updated last year
reka-ai / reka-vibe-eval
View on GitHub
Multimodal language model benchmark, featuring challenging examples
☆189Dec 18, 2024Updated last year
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
ShadeCloak / ADORA
View on GitHub
☆47Apr 9, 2025Updated last year
yichengchen24 / DataChef
View on GitHub
☆25Feb 12, 2026Updated 5 months ago
Infini-AI-Lab / M2PO
View on GitHub
☆34Oct 8, 2025Updated 9 months ago
morse-benchmark / morse-500
View on GitHub
☆31May 21, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ScalerLab / JudgeBench
View on GitHub
☆128Nov 7, 2024Updated last year
mlfoundations / scaling
View on GitHub
Language models scale reliably with over-training and on downstream tasks
☆102Apr 2, 2024Updated 2 years ago
sinwang20 / D2PO
View on GitHub
[ACL 2025] "World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning." https://arxiv.org/abs/2503.1…
☆18Jul 22, 2025Updated last year
sail-sg / regmix
View on GitHub
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆194Feb 17, 2025Updated last year
trishullab / itp-interface
View on GitHub
Generic interface for hooking up to any Interactive Theorem Prover (ITP) and collecting data for training ML models for AI in formal theo…
☆19Jul 10, 2026Updated 2 weeks ago
hkust-nlp / RL-Verifier-Robustness
View on GitHub
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆24Oct 7, 2025Updated 9 months ago
Purewhite2019 / formal_problem_solving_main
View on GitHub
[ICML'26 Spotlight] Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
☆31Jun 29, 2026Updated 3 weeks ago