om-ai-lab / open-agent-leaderboardLinks
Reproducible Language Agent Research
☆27Updated 3 months ago
Alternatives and similar repositories for open-agent-leaderboard
Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below
Sorting:
- ☆24Updated 9 months ago
- Verifiers for LLM Reinforcement Learning☆60Updated 2 months ago
- ☆41Updated 6 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆45Updated 2 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆18Updated last month
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated last week
- ☆65Updated 2 months ago
- Official Implementation of APB (ACL 2025 main)☆28Updated 4 months ago
- ☆56Updated 6 months ago
- ☆20Updated 3 months ago
- Efficient Agent Training for Computer Use☆106Updated 3 weeks ago
- ☆47Updated 2 weeks ago
- Reasoning by Communicating with Agents☆29Updated last month
- ☆29Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆25Updated 3 months ago
- ☆38Updated 6 months ago
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- [ICLR'25] ApolloMoE: Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts☆44Updated 7 months ago
- ☆50Updated 3 weeks ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆93Updated 2 weeks ago
- ☆36Updated 2 years ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆54Updated 4 months ago
- ☆86Updated last month
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 6 months ago
- ☆13Updated 6 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆72Updated last week
- ☆41Updated this week
- ☆16Updated 3 months ago
- Code for ScribeAgent paper☆58Updated 3 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆50Updated this week