om-ai-lab / open-agent-leaderboardView external linksLinks
Reproducible Language Agent Research
☆33Jun 25, 2025Updated 7 months ago
Alternatives and similar repositories for open-agent-leaderboard
Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below
Sorting:
- A suite of multimodal language models that are powerful and efficient☆17Jan 13, 2025Updated last year
- A collection of strong multimodal models for building multimodal AGI agents☆44Jul 9, 2024Updated last year
- ☆11Mar 13, 2023Updated 2 years ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Oct 15, 2025Updated 4 months ago
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- DataSciBench: An LLM Agent Benchmark for Data Science☆50Jan 21, 2026Updated 3 weeks ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling☆42Sep 26, 2025Updated 4 months ago
- ☆18Mar 19, 2025Updated 10 months ago
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- ☆20Mar 3, 2025Updated 11 months ago
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆32Feb 1, 2026Updated 2 weeks ago
- A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)☆62May 7, 2024Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆76Nov 20, 2025Updated 2 months ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆19May 24, 2025Updated 8 months ago
- A unified robotic manipulation learning framework☆21Sep 4, 2025Updated 5 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}☆31Oct 2, 2025Updated 4 months ago
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 8 months ago
- Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models☆29Oct 6, 2025Updated 4 months ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- ☆20Nov 4, 2025Updated 3 months ago
- ☆20Sep 2, 2024Updated last year
- 【HACKATHON 预备营】飞桨启航计划集训营☆17Dec 22, 2025Updated last month
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models. (ICML 2025)☆35Apr 2, 2025Updated 10 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆38Sep 26, 2025Updated 4 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆35Jun 13, 2025Updated 8 months ago
- Official PyTorch implementation for "MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens…☆45Jun 12, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Feb 25, 2025Updated 11 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆46Jan 29, 2026Updated 2 weeks ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Aug 5, 2025Updated 6 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆38Jun 24, 2025Updated 7 months ago
- Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023☆39Oct 16, 2025Updated 4 months ago
- [ACL 2025] Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging☆38Jun 4, 2025Updated 8 months ago
- Large-scale semi-supervised framework with 1B+ labeled masks from 48K+ datasets with test-time adaptation to new domains (ICCV25).☆43Dec 28, 2025Updated last month