Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike static benchmarks, this platform introduces evolving environments where agents must adapt their strategies as new information becomes available, mirroring real-world challenges.
☆523Jun 20, 2026Updated last week
Alternatives and similar repositories for meta-agents-research-environments
Users that are interested in meta-agents-research-environments are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆26Oct 9, 2025Updated 8 months ago
- ☆63Jun 2, 2026Updated 3 weeks ago
- [TMLR 2026] A Searching-based Agent Model for Open-Domain Open-Ended Question Answering☆39Jun 20, 2025Updated last year
- [ICLR 2026] AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆49Apr 17, 2026Updated 2 months ago
- Academic page for LimSim++☆11Mar 19, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning☆83Jan 16, 2026Updated 5 months ago
- Measuring Thinking Efficiency in Reasoning Models - Research Repository☆39Dec 2, 2025Updated 6 months ago
- [NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge☆112May 17, 2026Updated last month
- Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…☆806May 30, 2026Updated 3 weeks ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Feb 11, 2025Updated last year
- LIMI: Less is More for Agency☆162Oct 14, 2025Updated 8 months ago
- Data Augmentation for Supervised Graph Outlier Detection with Latent Diffusion Models☆15Sep 3, 2025Updated 9 months ago
- SkyRL: A Modular Full-stack RL Library for LLMs☆2,032Updated this week
- ☆24Mar 1, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model☆13Dec 29, 2024Updated last year
- ☆11Oct 25, 2024Updated last year
- A Gym for Agentic LLMs☆497Jan 21, 2026Updated 5 months ago
- AllenAI's post-training codebase☆3,775Updated this week
- Bayes-Adaptive RL for LLM Reasoning☆45May 28, 2025Updated last year
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆26Aug 8, 2024Updated last year
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆22,173Updated this week
- Our library for RL environments + evals☆4,233Updated this week
- The original Shared Recurrent Memory Transformer implementation☆36Jul 11, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆660Updated this week
- An enterprise deep research benchmark☆41Apr 22, 2026Updated 2 months ago
- A benchmark for evaluating LLMs on open-ended CS problems. Exploring the Next Frontier of Computer Science.☆246Jun 23, 2026Updated last week
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Mar 7, 2025Updated last year
- Code for the paper 🌳 Tree Search for Language Model Agents☆223Jul 25, 2024Updated last year
- ☆22May 3, 2025Updated last year
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆164Jun 22, 2026Updated last week
- ☆32Jun 5, 2025Updated last year
- Simple repository for training small reasoning models☆52Feb 17, 2026Updated 4 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,594Apr 24, 2026Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆607Aug 10, 2025Updated 10 months ago
- Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.☆271Oct 4, 2025Updated 8 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆72Nov 14, 2024Updated last year
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆398Jan 19, 2025Updated last year
- Lean evaluation and metaprogramming utilities for provers.☆119Jun 3, 2026Updated 3 weeks ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated last year