facebookresearch / meta-agents-research-environmentsLinks

Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.

☆338

Alternatives and similar repositories for meta-agents-research-environments

Users that are interested in meta-agents-research-environments are comparing it to the libraries listed below

Sorting:

facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆248Updated 6 months ago
StonyBrookNLP / appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…
☆302Updated last week
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆301Updated last week
vsubramaniam851 / multiagent-ft
☆221Updated 8 months ago
WooooDyy / AgentGym-RL
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…
☆467Updated last month
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆370Updated 3 months ago
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆343Updated 9 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆188Updated 8 months ago
axon-rl / gem
A Gym for Agentic LLMs
☆347Updated last week
ByteDance-Seed / Agent-R
Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"
☆161Updated 2 weeks ago
zhangxjohn / LLM-Agent-Benchmark-List
A banchmark list for evaluation of large language models.
☆146Updated 2 months ago
princeton-pli / hal-harness
☆178Updated this week
jwhj / OREO
☆116Updated 9 months ago
facebookresearch / RAM
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
☆296Updated 2 weeks ago
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆104Updated last week
MLE-Dojo / MLE-Dojo
☆77Updated last week
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆558Updated last week
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆568Updated 3 months ago
SalesforceAIResearch / LaTRO
☆122Updated 8 months ago
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆217Updated last year
R2E-Gym / R2E-Gym
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆181Updated 3 months ago
SWE-bench / SWE-smith
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆442Updated this week
allenai / olmes
Reproducible, flexible LLM evaluations
☆264Updated last week
knoveleng / open-rs
Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"
☆268Updated 3 weeks ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆201Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆109Updated 3 months ago
agentica-project / rllm
☆254Updated last month
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆372Updated 3 weeks ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆254Updated 6 months ago
multi-agent-systems-failure-taxonomy / MAST
☆291Updated 3 months ago