facebookresearch/MLGym

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/MLGym)

facebookresearch / MLGym

MLGym A New Framework and Benchmark for Advancing AI Research Agents

☆612

Alternatives and similar repositories for MLGym

Users that are interested in MLGym are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / swe-rl
View on GitHub
[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆712Mar 16, 2025Updated last year
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆709Jul 29, 2025Updated 11 months ago
openai / frontier-evals
View on GitHub
OpenAI Frontier Evals
☆1,262Apr 21, 2026Updated 3 months ago
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,464Apr 17, 2026Updated 3 months ago
mll-lab-nu / RAGEN
View on GitHub
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
☆2,756Apr 14, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
FLAIROx / cultural-accumulation
View on GitHub
☆16Jul 16, 2024Updated 2 years ago
du-nlp-lab / MLR-Copilot
View on GitHub
☆70Mar 30, 2025Updated last year
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,400Updated this week
openai / mle-bench
View on GitHub
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
☆1,655Apr 24, 2026Updated 3 months ago
facebookresearch / aira-dojo
View on GitHub
AIRA-dojo: a framework for developing and evaluating AI research agents
☆154Apr 14, 2026Updated 3 months ago
DigiRL-agent / digirl
View on GitHub
Official repo for paper DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning.
☆393Feb 22, 2025Updated last year
NovaSky-AI / SkyRL
View on GitHub
SkyRL: A Modular Full-stack RL Library for LLMs
☆2,088Updated this week
facebookresearch / sweet_rl
View on GitHub
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆271May 5, 2025Updated last year
ivanleomk / modal-grpo
View on GitHub
☆19Mar 16, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
WecoAI / aideml
View on GitHub
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.
☆1,442Jul 15, 2026Updated last week
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,727Updated this week
rungalileo / agent-leaderboard
View on GitHub
Ranking LLMs on agentic tasks
☆224May 21, 2026Updated 2 months ago
TheAgentCompany / TheAgentCompany
View on GitHub
An agent benchmark with tasks in a simulated software company.
☆751Nov 17, 2025Updated 8 months ago
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
nmonette / NCC-UED
View on GitHub
Official Implementation of `An Optimisation Framework for Unsupervised Environment Design` from RLC 2025
☆17Nov 24, 2025Updated 8 months ago
openai / SWELancer-Benchmark
View on GitHub
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…
☆1,435Jul 18, 2025Updated last year
vsubramaniam851 / multiagent-ft
View on GitHub
☆234Feb 24, 2025Updated last year
facebookresearch / meta-agents-research-environments
View on GitHub
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆531Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
DigiRL-agent / digiq
View on GitHub
☆121Apr 8, 2025Updated last year
YifeiZhou02 / ArCHer
View on GitHub
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆208Apr 17, 2025Updated last year
shawntan / scattermoe
View on GitHub
Triton-based implementation of Sparse Mixture of Experts.
☆281Oct 3, 2025Updated 9 months ago
R2E-Gym / R2E-Gym
View on GitHub
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆309Jul 13, 2025Updated last year
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,649Updated this week
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 6 months ago
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,870Dec 23, 2025Updated 7 months ago
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,808Updated this week
Sea-Snell / Implicit-Language-Q-Learning
View on GitHub
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆213Jul 31, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ServiceNow / PipelineRL
View on GitHub
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆428Updated this week
Aloriosa / srmt
View on GitHub
The original Shared Recurrent Memory Transformer implementation
☆36Jul 11, 2025Updated last year
ShengranHu / ADAS
View on GitHub
[ICLR 2025] Automated Design of Agentic Systems
☆1,619Jan 28, 2025Updated last year
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 10 months ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,920Updated this week
MoonshotAI / Moonlight
View on GitHub
Muon is Scalable for LLM Training
☆1,510Aug 3, 2025Updated 11 months ago
Agent-RL / ReCall
View on GitHub
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…
☆1,412May 16, 2025Updated last year