gersteinlab / ML-Bench
The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)
☆292Updated 4 months ago
Alternatives and similar repositories for ML-Bench:
Users that are interested in ML-Bench are comparing it to the libraries listed below
- MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…☆174Updated last week
- DeepRetrieval - Hacking 🔥Real Search Engines and Text/Data Retrievers with LLM + RL☆201Updated this week
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆133Updated 2 weeks ago
- Code Efficiency Benchmark☆77Updated 2 months ago
- Pytorch Library for Relational Table Learning with LLMs.☆421Updated this week
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆176Updated last week
- R1-like Computer-use Agent☆63Updated last week
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,158Updated this week
- BIRD-CRITIC 1.0: Can Large Language Models Solve USER SQL Issues in Real-World Database Applications?☆396Updated last month
- [EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models☆63Updated 5 months ago
- [ACL2024 Findings] Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM☆56Updated 3 weeks ago
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆154Updated 3 months ago
- ☆39Updated last week
- [ICLR Workshop 2025] An official source code for paper "GuardReasoner: Towards Reasoning-based LLM Safeguards".☆130Updated 3 weeks ago
- Source code for ICLR2025 paper "NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation".☆71Updated 3 weeks ago
- SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing☆127Updated 2 weeks ago
- JittorGeometric is a Jittor-based graph machine learning library.☆154Updated last week
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆197Updated 9 months ago
- A Tiny structure of pytorch for learning;☆56Updated 8 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆230Updated last week
- ☆52Updated 3 weeks ago
- Code for "Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering"☆16Updated last month
- A Contamination-free Multi-task Language Understanding Benchmark☆114Updated 2 months ago
- ☆118Updated 3 weeks ago
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆164Updated 3 months ago
- DPO-Shift: Shifting the Distribution of Direct Preference Optimization☆11Updated 3 weeks ago
- Multilingual Corpus of Web Fiction☆191Updated 9 months ago
- AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)☆187Updated 3 weeks ago
- Create your self-hosted, open-source Operator model.☆91Updated last week
- ☆420Updated 7 months ago