HumanEval-V / HumanEval-V-Benchmark
A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks
☆8Updated last month
Alternatives and similar repositories for HumanEval-V-Benchmark:
Users that are interested in HumanEval-V-Benchmark are comparing it to the libraries listed below
- ☆89Updated 3 weeks ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆56Updated 3 months ago
- ☆16Updated last week
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆47Updated 5 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆47Updated 9 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆13Updated last month
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆90Updated last week
- Reproducing R1 for Code with Reliable Rewards☆167Updated last week
- ☆37Updated last week
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆22Updated 2 months ago
- XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts☆30Updated 9 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆91Updated 10 months ago
- A Survey on the Honesty of Large Language Models☆57Updated 4 months ago
- [ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training☆29Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆31Updated 9 months ago
- ☆44Updated 5 months ago
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆43Updated last month
- ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation☆34Updated 2 weeks ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆19Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆13Updated last week
- ☆50Updated this week
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆59Updated 6 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆28Updated 3 months ago
- Extending context length of visual language models☆11Updated 4 months ago
- ☆18Updated 6 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.☆38Updated 3 weeks ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"☆77Updated last week
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 4 months ago