facebookresearch/aira-dojo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/aira-dojo)

facebookresearch / aira-dojo

AIRA-dojo: a framework for developing and evaluating AI research agents

☆154

Alternatives and similar repositories for aira-dojo

Users that are interested in aira-dojo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / llm-speedrunner
View on GitHub
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆145May 6, 2026Updated 2 months ago
openai / mle-bench
View on GitHub
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
☆1,655Apr 24, 2026Updated 3 months ago
sjtu-sai-agents / ML-Master
View on GitHub
The official implementation of "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning"
☆436Mar 29, 2026Updated 3 months ago
facebookresearch / CATransformers
View on GitHub
CATransformers is a framework for joint neural network and hardware architecture search.
☆24Mar 17, 2026Updated 4 months ago
MLE-Dojo / MLE-Dojo
View on GitHub
☆99Oct 30, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AlexGoldie / rl-learned-optimization
View on GitHub
Official Implementation of "Can Learned Optimization Make Reinforcement Learning Less Difficult"
☆31Dec 15, 2025Updated 7 months ago
facebookresearch / decrypto
View on GitHub
Implementation of the Decrypto benchmark for multi-agent reasoning and theory of mind.
☆22Jan 19, 2026Updated 6 months ago
GAIR-NLP / DatasetResearch
View on GitHub
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery
☆23Sep 24, 2025Updated 10 months ago
AlexGoldie / learn-rl-algorithms
View on GitHub
Official implementation for "How Should We Meta-Learn Reinforcement Learning Algorithms?"
☆23Sep 7, 2025Updated 10 months ago
jfc43 / MARS
View on GitHub
MARS, a framework optimized for autonomous AI research
☆39May 19, 2026Updated 2 months ago
facebookresearch / MLGym
View on GitHub
MLGym A New Framework and Benchmark for Advancing AI Research Agents
☆612Aug 10, 2025Updated 11 months ago
WecoAI / aideml
View on GitHub
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.
☆1,443Jul 15, 2026Updated last week
jaehyun513 / MLE-STAR
View on GitHub
☆17Aug 26, 2025Updated 11 months ago
facebookresearch / meta-agents-research-environments
View on GitHub
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆531Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
baidubce / FM-Agent
View on GitHub
☆138Mar 31, 2026Updated 3 months ago
facebookresearch / airs-bench
View on GitHub
AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents
☆104May 5, 2026Updated 2 months ago
deepakn97 / pare
View on GitHub
A research framework for evaluating proactive AI assistants through active user simulation
☆35May 23, 2026Updated 2 months ago
google-deepmind / regress-lm
View on GitHub
Library for sequence-to-sequence numeric prediction, applicable to any tokenizable input, and allows pretraining and fine-tuning over mul…
☆349Jul 3, 2026Updated 3 weeks ago
AlexGoldie / discogen
View on GitHub
Official implementation of DiscoGen, for "Procedural Generation of Algorithm Discovery Tasks in Machine Learning"
☆48Jul 2, 2026Updated 3 weeks ago
howard-yen / SLIM
View on GitHub
☆27Jun 22, 2026Updated last month
BatsResearch / fudd
View on GitHub
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
☆11Nov 15, 2023Updated 2 years ago
METR / RE-Bench
View on GitHub
☆145Oct 16, 2025Updated 9 months ago
flowersteam / WorldLLM
View on GitHub
LLM as World Models using Bayesian inference
☆21May 27, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
seanie12 / ThinkSafe
View on GitHub
☆20May 4, 2026Updated 2 months ago
MASWorks / ML-Agent
View on GitHub
The official implementation of "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering"
☆71Jun 21, 2025Updated last year
facebookresearch / swe-rl
View on GitHub
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆712Mar 16, 2025Updated last year
aisa-group / PostTrainBench
View on GitHub
Measuring how well CLI agents like Claude Code or Codex CLI can post-train base LLMs on a single H100 GPU in 10 hours
☆467Updated this week
UW-COSMOS / project-docs
View on GitHub
Project overview, roadmap and initial result reports
☆11Aug 6, 2022Updated 3 years ago
uoe-agents / reading-group
View on GitHub
Propose & vote on reading group papers in the "Discussions" tab.
☆12Feb 20, 2024Updated 2 years ago
allenai / neurodiscoverybench
View on GitHub
☆22Jan 29, 2026Updated 5 months ago
BatsResearch / ex2
View on GitHub
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Apr 4, 2024Updated 2 years ago
purbeshmitra / MOTIF
View on GitHub
MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
☆17Jul 6, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ml-feedback-sys / materials-f23
View on GitHub
☆10Nov 15, 2023Updated 2 years ago
facebookresearch / matrix
View on GitHub
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆282Jun 11, 2026Updated last month
ars22 / e3
View on GitHub
☆20Sep 16, 2025Updated 10 months ago
allenai / discoverybench
View on GitHub
Discovering Data-driven Hypotheses in the Wild
☆157Jun 9, 2025Updated last year
flowersteam / SOAR
View on GitHub
Implementation of SOAR
☆55Sep 17, 2025Updated 10 months ago
facebookresearch / ExploreToM
View on GitHub
Code for ExploreTom
☆93Jun 25, 2025Updated last year
WecoAI / weco-cli
View on GitHub
Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizabl…
☆80Updated this week