A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆15Feb 25, 2025Updated last year
Alternatives and similar repositories for HumanEval-V-Benchmark
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EvoEval: Evolving Coding Benchmarks via LLM☆81Apr 6, 2024Updated 2 years ago
- Replication package for ISSTA2023 paper - Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond☆23Apr 9, 2023Updated 3 years ago
- [ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆41Apr 7, 2025Updated last year
- This repo is for our submission for ICSE 2025.☆20Jun 12, 2024Updated last year
- ☆26Aug 2, 2025Updated 8 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This is the official implement for the paper 'Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases''☆14Oct 4, 2023Updated 2 years ago
- We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…☆26Aug 31, 2022Updated 3 years ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆102Oct 23, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- ☆49Jul 24, 2022Updated 3 years ago
- ☆10Mar 13, 2023Updated 3 years ago
- This repo illustrates how to evaluate the artifacts in the paper An Extensive Study on Pre-trained Models for Program Understanding and G…☆27Aug 12, 2022Updated 3 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- ☆14Jul 17, 2025Updated 9 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆18Sep 17, 2025Updated 7 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- Exercises for the Dafny Tutorial☆14May 21, 2018Updated 7 years ago
- ☆40Apr 6, 2026Updated last week
- ☆14Aug 18, 2025Updated 8 months ago
- ☆10Apr 15, 2023Updated 3 years ago
- ☆33Sep 14, 2025Updated 7 months ago
- Detection and Classification of UI Elements of Web pages and Apps from Wireframe Sketches☆10Oct 9, 2023Updated 2 years ago
- KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding☆67Apr 5, 2026Updated 2 weeks ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [ICML 2023] "Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability"☆18Jul 7, 2023Updated 2 years ago
- [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing☆13Feb 9, 2025Updated last year
- ☆11May 14, 2024Updated last year
- [ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing☆27Jan 27, 2026Updated 2 months ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- Model your data with the Von Mises-Fisher distribution in Python☆13Feb 1, 2016Updated 10 years ago
- Multimodal Large Language Models for Code Generation under Multimodal Scenarios☆236Apr 11, 2026Updated last week
- ☆12Aug 9, 2023Updated 2 years ago
- ☆17Mar 22, 2021Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆16May 25, 2022Updated 3 years ago
- the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral☆59Apr 13, 2021Updated 5 years ago
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆39Sep 26, 2025Updated 6 months ago
- [NeurIPS 2025] CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning☆17Jan 24, 2026Updated 2 months ago
- ☆13Jan 19, 2026Updated 3 months ago
- ☆13Nov 8, 2022Updated 3 years ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Mar 11, 2025Updated last year