A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆15Feb 25, 2025Updated last year
Alternatives and similar repositories for HumanEval-V-Benchmark
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EvoEval: Evolving Coding Benchmarks via LLM☆84Apr 6, 2024Updated 2 years ago
- ☆25Aug 2, 2025Updated 10 months ago
- ☆14Jan 22, 2025Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆103Oct 23, 2024Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2023] "Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation"☆11Oct 6, 2023Updated 2 years ago
- ☆10Mar 13, 2023Updated 3 years ago
- UICrit is a dataset containing human-generated natural language design critiques, corresponding bounding boxes for each critique, and des…☆25Nov 19, 2024Updated last year
- This repo illustrates how to evaluate the artifacts in the paper An Extensive Study on Pre-trained Models for Program Understanding and G…☆27Aug 12, 2022Updated 3 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- ☆14Jul 17, 2025Updated 11 months ago
- PyTorch使用技巧和教程☆12Apr 17, 2023Updated 3 years ago
- ☆12Jan 17, 2024Updated 2 years ago
- [ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆20Sep 17, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆79Jun 16, 2024Updated 2 years ago
- Replication Package for "Compressing Pre-trained Models of Code into 3 MB", ASE 2022☆30Oct 10, 2024Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆25Oct 17, 2024Updated last year
- ☆40Apr 6, 2026Updated 2 months ago
- ☆17Aug 18, 2025Updated 10 months ago
- ☆10Apr 15, 2023Updated 3 years ago
- KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding☆68Apr 5, 2026Updated 2 months ago
- [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing☆13Feb 9, 2025Updated last year
- [ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing☆29May 11, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆10May 14, 2024Updated 2 years ago
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆26Nov 29, 2024Updated last year
- Multimodal Large Language Models for Code Generation under Multimodal Scenarios☆251Jun 8, 2026Updated last week
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆41Sep 26, 2025Updated 8 months ago
- ☆14Jan 19, 2026Updated 5 months ago
- ☆44Dec 8, 2025Updated 6 months ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆18Mar 11, 2025Updated last year
- This is the tool released in ICSE 2024 paper "Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Er…☆17Jun 5, 2023Updated 3 years ago
- Implementation for AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications☆38Mar 24, 2026Updated 2 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆16Nov 24, 2023Updated 2 years ago
- ☆13Feb 29, 2024Updated 2 years ago
- ☆16Aug 26, 2023Updated 2 years ago
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆26Jan 25, 2025Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Jul 10, 2024Updated last year
- Web site for standardml.org.☆37Oct 17, 2023Updated 2 years ago
- [NeurIPS 2025] A multimodal agent that can interact with its own PC in a multimodal manner.☆38Apr 23, 2026Updated last month