facebookresearch / meta-agents-research-environmentsLinks
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.
β397Updated last month
Alternatives and similar repositories for meta-agents-research-environments
Users that are interested in meta-agents-research-environments are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasksβ254Updated 7 months ago
- π AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resourceβ¦β335Updated last month
- AWM: Agent Workflow Memoryβ370Updated 10 months ago
- Code for the paper: "Learning to Reason without External Rewards"β382Updated 5 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learningβ328Updated last month
- Code for the paper π³ Tree Search for Language Model Agentsβ216Updated last year
- (ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019β¦β196Updated last month
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]β601Updated 4 months ago
- β202Updated 2 weeks ago
- A Gym for Agentic LLMsβ404Updated last month
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β582Updated last month
- β304Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.β189Updated 9 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β331Updated last week
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"β631Updated 9 months ago
- Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemenβ¦β529Updated 3 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"β162Updated last month
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agentsβ485Updated last week
- [COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agentsβ210Updated 5 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Exampleβ385Updated 3 weeks ago
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memoryβ225Updated 6 months ago
- β226Updated 9 months ago
- A banchmark list for evaluation of large language models.β152Updated 3 months ago
- β295Updated 3 months ago
- Complex Function Calling Benchmark.β157Updated 10 months ago
- A simple unified framework for evaluating LLMsβ257Updated 8 months ago
- β117Updated 10 months ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agentsβ515Updated 3 weeks ago
- β85Updated last month
- LOFT: A 1 Million+ Token Long-Context Benchmarkβ220Updated 6 months ago