gersteinlab / ML-Bench
The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)
☆355Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ML-Bench
- The repository for the paper titled "Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks"☆184Updated 3 weeks ago
- Code Efficiency Benchmark☆86Updated this week
- Multilingual Corpus of Web Fiction☆216Updated 4 months ago
- One-stop data intelligence agent, providing insights from all mainstream data formats in a single dialogue box, including documents, data…☆504Updated 2 weeks ago
- AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning (NeurIPS 2024)☆168Updated last week
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆117Updated last year
- Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…☆161Updated this week
- Pytorch Library for Relational Table Learning with LLMs.☆283Updated this week
- Unified KV Cache Compression Methods for LLMs☆728Updated this week
- ☆17Updated 2 years ago
- Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"☆225Updated 3 months ago
- TxBKG - Knowledge Graph Generation for Any PDFs☆223Updated last month
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆129Updated 2 weeks ago
- Harnessing the Power of AI to Navigate the Information Age – Uncovering Truth, Promoting Transparency, and Championing Fact-Based Discour…☆208Updated last year
- ☆223Updated 4 months ago
- Benchmarking LLMs via Uncertainty Quantification☆221Updated 9 months ago
- This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.☆66Updated last month
- A Tiny structure of pytorch for learning; 一个最小pytorch的实现☆52Updated 4 months ago
- ☆115Updated last year
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆269Updated 7 months ago
- ☆607Updated 2 months ago
- Openai API Cost Tracker☆22Updated 8 months ago
- Completed this competition in collaboration with Jiang Yan(https://github.com/jy1993) and Guan Shuicheng(https://github.com/guanshuicheng…☆505Updated 2 weeks ago
- [NeurIPS 2022] Official Code for REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering☆134Updated 2 months ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆155Updated 4 months ago
- Mixed precision inference by Tensorrt-LLM☆93Updated 3 weeks ago
- DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models☆134Updated 2 months ago
- ☆204Updated 5 months ago
- Large-Scale Selfie Video Dataset (L-SVD): A Benchmark for Emotion Recognition☆407Updated 3 months ago
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.☆182Updated last week