☆338Jun 19, 2024Updated last year
Alternatives and similar repositories for MLAgentBench
Users that are interested in MLAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24☆232Dec 3, 2024Updated last year
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆116Aug 17, 2025Updated 8 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,510Apr 24, 2026Updated last week
- AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.☆1,245Apr 21, 2026Updated last week
- Redwood Research's transformer interpretability tools☆15Apr 15, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,377Feb 8, 2026Updated 2 months ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆35Oct 25, 2024Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆285Aug 19, 2023Updated 2 years ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆136Updated this week
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆248May 5, 2024Updated last year
- AIDE: the Machine Learning CodeGen Agent☆25Oct 7, 2024Updated last year
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 7 months ago
- ☆136Oct 16, 2025Updated 6 months ago
- ☆88Dec 15, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆25Oct 12, 2024Updated last year
- ☆68Mar 30, 2025Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆146Apr 11, 2024Updated 2 years ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆151Aug 26, 2024Updated last year
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,485Oct 31, 2023Updated 2 years ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆556Oct 28, 2023Updated 2 years ago
- ☆19May 23, 2023Updated 2 years ago
- ☆14Jul 12, 2024Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆44Oct 28, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,477Sep 9, 2025Updated 7 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆49Dec 22, 2023Updated 2 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆74Jun 25, 2024Updated last year
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆477Mar 19, 2024Updated 2 years ago
- ☆189Jan 27, 2025Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆124Mar 31, 2025Updated last year
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 7 months ago
- ☆2,896Feb 20, 2025Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆114Dec 12, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A curated list of papers on LLMs and agents for scientific research and development☆87Dec 11, 2024Updated last year
- Semi-automatic feature engineering process using Language Models and your dataset descriptions. Based on the paper "LLMs for Semi-Automat…☆188Dec 20, 2024Updated last year
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆526Sep 6, 2024Updated last year
- ☆39May 2, 2024Updated last year
- FireAct: Toward Language Agent Fine-tuning☆292Oct 22, 2023Updated 2 years ago
- ☆292Dec 4, 2024Updated last year
- [ICLR 2025] Automated Design of Agentic Systems☆1,560Jan 28, 2025Updated last year