Reproducible Language Agent Research
☆34Jun 25, 2025Updated 9 months ago
Alternatives and similar repositories for open-agent-leaderboard
Users that are interested in open-agent-leaderboard are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆77Nov 20, 2025Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- DataSciBench: An LLM Agent Benchmark for Data Science☆55Jan 21, 2026Updated 2 months ago
- Research simulation toolkit for federated learning☆13Nov 7, 2020Updated 5 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A unified robotic manipulation learning framework☆21Sep 4, 2025Updated 6 months ago
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆25Dec 14, 2025Updated 3 months ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆13Jan 12, 2026Updated 2 months ago
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling☆47Sep 26, 2025Updated 6 months ago
- my personal mcp server☆13Apr 23, 2025Updated 11 months ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Oct 15, 2025Updated 5 months ago
- ☆18Mar 19, 2025Updated last year
- Azure Command-Line Interface☆12Dec 10, 2023Updated 2 years ago
- Code for "Unsupervised Cross-lingual Transfer of Word Embedding Spaces" in EMNLP 2018☆24Dec 29, 2018Updated 7 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents☆47Mar 7, 2026Updated 3 weeks ago
- ☆20Mar 3, 2025Updated last year
- ☆14Jul 5, 2024Updated last year
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Aug 5, 2025Updated 7 months ago
- Open Co Scientist aims to democratize scientific research by providing an open-source implementation of an AI co-scientist system.☆15Mar 1, 2025Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆38Jul 25, 2024Updated last year
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆20May 24, 2025Updated 10 months ago
- ☆19Nov 4, 2025Updated 4 months ago
- this is a trained yolov8n network that only detects people, at "eye-height", trained in a super basic way on COCO☆13Dec 18, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Feb 25, 2025Updated last year
- Instruction Following Eval☆16Jan 16, 2025Updated last year
- AI-Rag-ChatBot is a complete project example with RAGChat and Next.js 14, using Upstash Vector Database, Upstash Qstash, Upstash Redis, D…☆15Jul 10, 2025Updated 8 months ago
- Guide for setting up an autossh reverse tunnel that auto re-establishes☆16Apr 13, 2019Updated 6 years ago
- Benchmarking LLM Inference Speeds☆13Mar 3, 2026Updated 3 weeks ago
- [ACL 2025 Main] Open-source toolkit for automatic evaluation of text-to-image generation task, including training & test datasets and a d…☆17Jul 5, 2025Updated 8 months ago
- ☆16Oct 27, 2024Updated last year
- Cog wrapper for playgroundai/playground-v2.5-1024px-aesthetic☆17Nov 25, 2024Updated last year
- Notebooks for CS4305TU Regression Lectures☆11Oct 14, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- An advanced research assistant that utilizes AI agents to generate novel research directions and analyze scientific literature. This plat…☆16Feb 26, 2025Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 9 months ago
- ☆13Apr 30, 2025Updated 11 months ago
- High-performance ASR tool using Faster Whisper, supporting custom models, multi-language transcription, and real-time processing feedback…☆10Sep 17, 2025Updated 6 months ago
- Papers of Implicit Reasoning in LLMs.☆24Mar 13, 2025Updated last year
- get the media stream from Dahua/Haikang IPC SDK, and demux the stream to vedio and audio ES☆13Nov 15, 2015Updated 10 years ago