gersteinlab / ML-BenchLinks
The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)
☆301Updated 6 months ago
Alternatives and similar repositories for ML-Bench
Users that are interested in ML-Bench are comparing it to the libraries listed below
Sorting:
- From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery☆126Updated this week
- [EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models☆70Updated 3 weeks ago
- MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…☆176Updated last month
- When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification☆276Updated this week
- Official implementation of RARE: Retrieval-Augmented Reasoning Modeling☆171Updated last week
- R1-like Computer-use Agent☆73Updated 2 months ago
- Code Efficiency Benchmark☆78Updated last month
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆136Updated 2 months ago
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,185Updated 2 months ago
- Official Repository for Paper: The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning☆49Updated last month
- BIRD-CRITIC 1.0: Can Large Language Models Solve USER SQL Issues in Real-World Database Applications?☆572Updated last month
- SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing☆143Updated 2 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆245Updated last month
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆226Updated 2 months ago
- DeepRetrieval - 🔥 Training Search Agent with Retrieval Outcomes via Reinforcement Learning☆521Updated last week
- ☆94Updated last week
- Pytorch Library for Relational Table Learning with LLMs.☆429Updated 2 weeks ago
- Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning☆70Updated last month
- A clean and extensible agentic RAG system with modular implementation.☆99Updated last month
- This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.☆60Updated 7 months ago
- Source code for ICLR2025 paper "NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation".☆75Updated last month
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆93Updated last year
- SQL-o1: A Self-Reward Heuristic Dynamic Search Method for Text-to-SQL☆106Updated 2 weeks ago
- MAKGED is the first multi-agent framework for collaborative error detection in knowledge graphs.☆28Updated 3 months ago
- ☆60Updated 2 months ago
- JittorGeometric is a Jittor-based graph machine learning library.☆155Updated last week
- Code for "Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning"☆27Updated this week
- ☆45Updated 2 months ago
- AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)☆201Updated last month
- [ICML2025] Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment☆94Updated 2 weeks ago