facebookresearch / meta-agents-research-environmentsLinks
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.
☆305Updated this week
Alternatives and similar repositories for meta-agents-research-environments
Users that are interested in meta-agents-research-environments are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆246Updated 5 months ago
- Code for the paper: "Learning to Reason without External Rewards"☆364Updated 3 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆286Updated last week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆269Updated 2 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆161Updated last week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆187Updated 7 months ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆102Updated 2 weeks ago
- A Gym for Agentic LLMs☆300Updated last week
- AWM: Agent Workflow Memory☆332Updated 8 months ago
- ☆116Updated 8 months ago
- ☆76Updated last month
- ☆218Updated 7 months ago
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆170Updated 3 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆364Updated this week
- Code for the paper 🌳 Tree Search for Language Model Agents☆217Updated last year
- A simple unified framework for evaluating LLMs☆251Updated 6 months ago
- Reproducible, flexible LLM evaluations☆256Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆429Updated this week
- A banchmark list for evaluation of large language models.☆144Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆535Updated this week
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆260Updated 3 weeks ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆265Updated 5 months ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory☆97Updated 4 months ago
- ☆214Updated 2 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆294Updated this week
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆200Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆241Updated 11 months ago
- ☆86Updated 4 months ago
- ☆321Updated 4 months ago
- ☆283Updated 2 months ago