om-ai-lab / open-agent-leaderboardLinks
Reproducible Language Agent Research
☆26Updated 2 months ago
Alternatives and similar repositories for open-agent-leaderboard
Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below
Sorting:
- ☆38Updated 5 months ago
- Collection of model-centric MCP servers☆17Updated 2 weeks ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆45Updated last month
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- ☆47Updated 5 months ago
- ☆24Updated 8 months ago
- ☆13Updated 5 months ago
- ☆36Updated 2 years ago
- An open-source toolkit helping developers build natural language database query solutions☆14Updated last month
- ☆83Updated 3 weeks ago
- ☆55Updated 6 months ago
- II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset☆17Updated last month
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆94Updated 2 months ago
- ☆56Updated 6 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- ☆68Updated 8 months ago
- Official Implementation of APB (ACL 2025 main)☆28Updated 3 months ago
- ☆41Updated 5 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 3 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆53Updated 2 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 7 months ago
- Efficient Agent Training for Computer Use☆94Updated last week
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆35Updated 8 months ago
- ☆50Updated this week
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆42Updated 3 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆68Updated last month
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆35Updated last month
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 3 weeks ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆105Updated 7 months ago