facebookresearch/llm-speedrunner

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/llm-speedrunner)

facebookresearch / llm-speedrunner

The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in language modeling.

☆145

Alternatives and similar repositories for llm-speedrunner

Users that are interested in llm-speedrunner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / aira-dojo
View on GitHub
AIRA-dojo: a framework for developing and evaluating AI research agents
☆154Apr 14, 2026Updated 3 months ago
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 11 months ago
KempnerInstitute / llm_uncertainty
View on GitHub
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆11Jul 18, 2026Updated last week
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
METR / RE-Bench
View on GitHub
☆145Oct 16, 2025Updated 9 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HazyResearch / scaling-verification
View on GitHub
☆26Sep 4, 2025Updated 10 months ago
xeophon / beam
View on GitHub
☆16Feb 22, 2026Updated 5 months ago
fjzzq2002 / WeightWatch
View on GitHub
Official Repository of Paper "Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs"
☆15Sep 25, 2025Updated 10 months ago
hiverge / cifar10-speedrun
View on GitHub
CIFAR-10 speedrun: Trains to 94% accuracy in 1.98 seconds on a single NVIDIA A100 GPU.
☆79Oct 17, 2025Updated 9 months ago
microsoft / ArchScale
View on GitHub
Simple & Scalable Pretraining for Neural Architecture Research
☆340Mar 31, 2026Updated 3 months ago
Kernel-Machines / kermac
View on GitHub
Pytorch routines for (Ker)nel (Mac)hines
☆12Oct 10, 2025Updated 9 months ago
microsoft / DGT
View on GitHub
Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent
☆16Sep 8, 2022Updated 3 years ago
nikhilvyas / SOAP
View on GitHub
☆275Dec 2, 2024Updated last year
PrimeIntellect-ai / lab-cookbook
View on GitHub
Lab Cookbook
☆37Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,724Updated this week
Jaykef / Triton-nanoGPT
View on GitHub
Custom triton kernels for training Karpathy's nanoGPT.
☆19Oct 21, 2024Updated last year
ScalingIntelligence / kernelbench-tinker
View on GitHub
Tinker ↔ KernelBench Integration enabling RL for GPU Kernel Generation
☆29Mar 5, 2026Updated 4 months ago
alex-damian / EOS
View on GitHub
☆15Sep 29, 2022Updated 3 years ago
WecoAI / weco-cli
View on GitHub
Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizabl…
☆80Updated this week
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,468Apr 17, 2026Updated 3 months ago
tianyi-lab / R2-T2
View on GitHub
[ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"
☆19Mar 10, 2025Updated last year
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆467Updated this week
ethanhe42 / nanoRL
View on GitHub
☆128Jun 1, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
muellerzr / smol-moe
View on GitHub
☆25Oct 10, 2025Updated 9 months ago
haileyschoelkopf / triton-index
View on GitHub
See https://github.com/cuda-mode/triton-index/ instead!
☆11May 8, 2024Updated 2 years ago
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
facebookresearch / matrix
View on GitHub
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆282Jun 11, 2026Updated last month
facebookresearch / meta-agents-research-environments
View on GitHub
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆531Updated this week
egrefen / splitappendix
View on GitHub
A simple shellscript for splitting the PDF of a paper into the main body and an appendix.
☆18Jun 1, 2020Updated 6 years ago
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
fKunstner / noise-sgd-adam-sign
View on GitHub
☆16Apr 26, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sanyalsunny111 / Looped-GPT
View on GitHub
Minimal and highly hackable implementation of Looped Transformers with GPT
☆25Mar 8, 2026Updated 4 months ago
thunlp / SparsingLaw
View on GitHub
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆32Nov 12, 2024Updated last year
gso-bench / gso
View on GitHub
[NeurIPS '25] GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
☆87Jul 12, 2026Updated 2 weeks ago
cybertronai / Megatron-LM
View on GitHub
Ongoing research training transformer language models at scale, including: BERT
☆16Apr 25, 2019Updated 7 years ago
ZihanWang314 / CoE
View on GitHub
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆231Nov 4, 2025Updated 8 months ago
KellerJordan / modded-nanogpt
View on GitHub
NanoGPT (124M) in 90 seconds
☆5,581Jul 3, 2026Updated 3 weeks ago
Mercor-Intelligence / apex-evals
View on GitHub
☆15Jun 19, 2026Updated last month