A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆15Feb 25, 2025Updated last year
Alternatives and similar repositories for HumanEval-V-Benchmark
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Source code for ISSTA'24 paper "AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation"☆12Oct 21, 2024Updated last year
- 基于CodeBert预训练模型,微调后/直接对目标数据集进行测试☆14Oct 19, 2021Updated 4 years ago
- [ACL'25 Findings] Official repo for "HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task"☆41Apr 7, 2025Updated last year
- Source code embeddings for various programming languages☆17Jul 11, 2018Updated 7 years ago
- ☆25Aug 2, 2025Updated 10 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This is the official implement for the paper 'Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases''☆14Oct 4, 2023Updated 2 years ago
- We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…☆26Aug 31, 2022Updated 3 years ago
- ☆14Jan 22, 2025Updated last year
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- ☆49Jul 24, 2022Updated 3 years ago
- ☆10Mar 13, 2023Updated 3 years ago
- This repo illustrates how to evaluate the artifacts in the paper An Extensive Study on Pre-trained Models for Program Understanding and G…☆27Aug 12, 2022Updated 3 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆17Dec 19, 2024Updated last year
- PyTorch使用技巧和教程☆12Apr 17, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆79Jun 16, 2024Updated 2 years ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆25Oct 17, 2024Updated last year
- (AAAI 2026) OSVBench, a new benchmark for evaluating Large Language Models (LLMs) in generating complete specification code pertaining to…☆15May 13, 2025Updated last year
- ☆40Apr 6, 2026Updated 2 months ago
- ☆17Aug 18, 2025Updated 10 months ago
- ☆35Sep 14, 2025Updated 9 months ago
- [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing☆13Feb 9, 2025Updated last year
- [ICLR 2026] Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing☆29May 11, 2026Updated last month
- ☆10May 14, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆26Nov 29, 2024Updated last year
- This is the code repo for the paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rewards".☆24Oct 28, 2024Updated last year
- [ICCV 2025] MRGen: Segmentation Data Engine for Underrepresented MRI Modalities☆41Sep 26, 2025Updated 8 months ago
- ☆13Nov 8, 2022Updated 3 years ago
- ☆44Dec 8, 2025Updated 6 months ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆18Mar 11, 2025Updated last year
- Implementation for AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications☆38Mar 24, 2026Updated 2 months ago
- ☆16Nov 24, 2023Updated 2 years ago
- ☆13Feb 29, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Codebase for RecSys 2024 paper, The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation☆19Aug 7, 2024Updated last year
- ☆16Aug 26, 2023Updated 2 years ago
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆26Jan 25, 2025Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Jul 10, 2024Updated last year
- Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark☆28Apr 22, 2025Updated last year
- ☆10Jul 19, 2023Updated 2 years ago
- VulTrigger is a tool to for identifying vulnerability-triggering statements across functions and investigating the effectiveness of funct…☆42Dec 29, 2023Updated 2 years ago