OpenDFM / MULTI-Benchmark
MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images
☆38Updated 3 weeks ago
Alternatives and similar repositories for MULTI-Benchmark
Users that are interested in MULTI-Benchmark are comparing it to the libraries listed below
Sorting:
- ☆51Updated last year
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆53Updated this week
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆44Updated 6 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆121Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 3 months ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆50Updated this week
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆79Updated 3 months ago
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆76Updated last week
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 5 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- ☆75Updated 4 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago
- ☆73Updated last year
- [ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training☆32Updated 3 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆30Updated last year
- ☆63Updated last year
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆113Updated 3 weeks ago
- ☆29Updated 7 months ago
- ☆97Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- Official repository of MMDU dataset☆90Updated 7 months ago
- Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)☆86Updated 7 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- ☆41Updated 4 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 3 months ago
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆106Updated this week