facebookresearch / meta-agents-research-environmentsLinks
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.
☆411Updated last month
Alternatives and similar repositories for meta-agents-research-environments
Users that are interested in meta-agents-research-environments are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆254Updated 8 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆334Updated 2 months ago
- 🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…☆351Updated last month
- A Gym for Agentic LLMs☆412Updated last week
- Code for the paper: "Learning to Reason without External Rewards"☆385Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆190Updated 10 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆166Updated 2 months ago
- ☆317Updated 5 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆611Updated 5 months ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆117Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆609Updated this week
- ☆209Updated last week
- AWM: Agent Workflow Memory☆376Updated 2 weeks ago
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆224Updated 5 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆335Updated 3 weeks ago
- ☆226Updated 10 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆218Updated last year
- ☆116Updated 11 months ago
- A banchmark list for evaluation of large language models.☆154Updated 4 months ago
- Reproducible, flexible LLM evaluations☆316Updated last month
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆495Updated last week
- Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…☆545Updated 3 months ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆650Updated 9 months ago
- ☆88Updated 2 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆392Updated last month
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…☆202Updated 2 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆381Updated last year
- ☆308Updated 3 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆281Updated 3 months ago