HumanEval-V / HumanEval-V-Benchmark
A Lightweight Visual Understanding and Reasoning Benchmark for Evaluating Large Multimodal Models through Coding Tasks
☆14Updated this week
Related projects ⓘ
Alternatives and complementary repositories for HumanEval-V-Benchmark
- ☆54Updated 2 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆24Updated 4 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- ☆12Updated 2 months ago
- ☆34Updated 9 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆39Updated 4 months ago
- ☆20Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆67Updated 11 months ago
- ☆27Updated last year
- [EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards☆44Updated 6 months ago
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆16Updated last month
- ☆23Updated 6 months ago
- ☆27Updated 9 months ago
- ☆38Updated last year
- [ICML'24] TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks☆22Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- ☆33Updated last year
- ☆16Updated last month
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆49Updated last week
- This is the implementation for the paper "LARGE LANGUAGE MODEL CASCADES WITH MIX- TURE OF THOUGHT REPRESENTATIONS FOR COST- EFFICIENT REA…☆18Updated 5 months ago
- ☆21Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 5 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆60Updated 8 months ago
- This repository contains data, code and models for contextual noncompliance.☆18Updated 4 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆26Updated 4 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆43Updated 3 weeks ago
- BeHonest: Benchmarking Honesty in Large Language Models☆30Updated 3 months ago
- This the implementation of LeCo☆27Updated 4 months ago