☆342Jun 19, 2024Updated last year
Alternatives and similar repositories for MLAgentBench
Users that are interested in MLAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24☆233Dec 3, 2024Updated last year
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆117Aug 17, 2025Updated 9 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,566Apr 24, 2026Updated last month
- AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.☆1,314May 2, 2026Updated last month
- Redwood Research's transformer interpretability tools☆15Apr 15, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,472Feb 8, 2026Updated 4 months ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆35Oct 25, 2024Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆287Aug 19, 2023Updated 2 years ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆139Apr 29, 2026Updated last month
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆248May 5, 2024Updated 2 years ago
- ML-Dev-Bench is a benchmark for evaluating AI agents against various ML development tasks.☆42Mar 10, 2026Updated 3 months ago
- AIDE: the Machine Learning CodeGen Agent☆25Oct 7, 2024Updated last year
- Self-Alignment with Principle-Following Reward Models☆170Sep 18, 2025Updated 8 months ago
- ☆139Oct 16, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆89Dec 15, 2023Updated 2 years ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆27Oct 12, 2024Updated last year
- ☆69Mar 30, 2025Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆146Apr 11, 2024Updated 2 years ago
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,497Oct 31, 2023Updated 2 years ago
- [ICLR 2024] Lemur: Open Foundation Models for Language Agents☆556Oct 28, 2023Updated 2 years ago
- ☆19May 23, 2023Updated 3 years ago
- ☆55Sep 9, 2023Updated 2 years ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆152Aug 26, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Jul 12, 2024Updated last year
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆44Oct 28, 2024Updated last year
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,485Sep 9, 2025Updated 9 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆50Dec 22, 2023Updated 2 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆74Jun 25, 2024Updated last year
- ☆189Jun 2, 2026Updated last week
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆478Mar 19, 2024Updated 2 years ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆124Mar 31, 2025Updated last year
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆2,901Feb 20, 2025Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆114Dec 12, 2024Updated last year
- A curated list of papers on LLMs and agents for scientific research and development☆91Dec 11, 2024Updated last year
- ☆39May 2, 2024Updated 2 years ago
- FireAct: Toward Language Agent Fine-tuning☆294Oct 22, 2023Updated 2 years ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆548Sep 6, 2024Updated last year
- ☆303Dec 4, 2024Updated last year