A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆14Feb 25, 2025Updated last year
Alternatives and similar repositories for HumanEval-V-Benchmark
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Source code for ISSTA'24 paper "AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation"☆12Oct 21, 2024Updated last year
- 基于CodeBert预训练模型,微调后/直接对目标数据集进行测试☆14Oct 19, 2021Updated 4 years ago
- EvoEval: Evolving Coding Benchmarks via LLM☆81Apr 6, 2024Updated last year
- [ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆39Apr 7, 2025Updated 11 months ago
- Source code embeddings for various programming languages☆17Jul 11, 2018Updated 7 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆25Aug 2, 2025Updated 7 months ago
- This is the official implement for the paper 'Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases''☆14Oct 4, 2023Updated 2 years ago
- ☆13Jan 22, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆102Oct 23, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- This repo illustrates how to evaluate the artifacts in the paper An Extensive Study on Pre-trained Models for Program Understanding and G…☆27Aug 12, 2022Updated 3 years ago
- [ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆17Sep 17, 2025Updated 6 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- ☆14Jul 17, 2025Updated 8 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- PyTorch使用技巧和教程☆11Apr 17, 2023Updated 2 years ago
- ☆12Jan 17, 2024Updated 2 years ago
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆80Jun 16, 2024Updated last year
- Replication Package for "Compressing Pre-trained Models of Code into 3 MB", ASE 2022☆30Oct 10, 2024Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- ☆40Aug 4, 2025Updated 7 months ago
- (AAAI 2026) OSVBench, a new benchmark for evaluating Large Language Models (LLMs) in generating complete specification code pertaining to…☆13May 13, 2025Updated 10 months ago
- ☆10Apr 15, 2023Updated 2 years ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Updated this week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Multimodal Large Language Models for Code Generation under Multimodal Scenarios☆223Mar 23, 2026Updated last week
- [ICML 2023] "Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability"☆18Jul 7, 2023Updated 2 years ago
- [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing☆13Feb 9, 2025Updated last year
- ☆11May 14, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- Model your data with the Von Mises-Fisher distribution in Python☆13Feb 1, 2016Updated 10 years ago
- [ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing☆27Jan 27, 2026Updated 2 months ago
- This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".☆24Oct 28, 2024Updated last year
- ☆12Aug 9, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆17Mar 22, 2021Updated 5 years ago
- ☆16May 25, 2022Updated 3 years ago
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆39Sep 26, 2025Updated 6 months ago
- the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral☆59Apr 13, 2021Updated 4 years ago
- ☆13Nov 8, 2022Updated 3 years ago
- ☆44Dec 8, 2025Updated 3 months ago
- Code for ICSE'24 Paper☆14Apr 21, 2024Updated last year