om-ai-lab / open-agent-leaderboard
Reproducible Language Agent Research
☆24Updated 2 months ago
Alternatives and similar repositories for open-agent-leaderboard:
Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below
- ☆25Updated 7 months ago
- Collection of model-centric MCP servers☆14Updated this week
- ☆47Updated 4 months ago
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆35Updated 7 months ago
- ☆38Updated 4 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- ☆32Updated last year
- ☆93Updated 3 months ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆44Updated last month
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 2 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆35Updated 2 months ago
- ☆27Updated 2 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 6 months ago
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆88Updated last month
- ☆37Updated 2 years ago
- From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation☆89Updated this week
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- ☆64Updated 7 months ago
- Extensive Self-Contrast Enables Feedback-Free Language Model Alignment☆21Updated last year
- ☆28Updated last year
- ☆63Updated last month
- Agentic Knowledgeable Self-awareness☆56Updated 3 weeks ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆45Updated 3 months ago
- ☆41Updated 4 months ago
- ☆38Updated 7 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆51Updated 4 months ago
- A collection of strong multimodal models for building multimodal AGI agents☆42Updated 10 months ago
- ☆86Updated this week
- [IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning☆22Updated last year