OpenDFM / MULTI-Benchmark
MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images
☆25Updated 6 months ago
Related projects: ⓘ
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆32Updated 10 months ago
- ☆46Updated 10 months ago
- Towards Large Multimodal Models as Visual Foundation Agents☆87Updated 3 weeks ago
- This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Co…☆66Updated 2 months ago
- [CVPR2024] This is the official implement of MP5☆72Updated 2 months ago
- [EMNLP 2022] The baseline code for META-GUI dataset☆10Updated 2 months ago
- ⛏💎 STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆27Updated 8 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆36Updated 2 months ago
- ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.☆101Updated 2 weeks ago
- ☆33Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆51Updated 3 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆39Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆36Updated 5 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆59Updated 7 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆21Updated 2 months ago
- GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes fr…☆57Updated 2 months ago
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆68Updated 2 months ago
- ☆40Updated 5 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆21Updated 2 months ago
- ☆27Updated last month
- Multi-modal code generation problems.☆15Updated 2 weeks ago
- ☆11Updated 4 months ago
- PyTorch implementation of StableMask (ICML'24)☆11Updated 2 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- Official github repo of G-LLaVA☆116Updated 3 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆23Updated 2 months ago
- ☆29Updated 2 months ago
- ☆53Updated 7 months ago
- The Official Code Repository for GUI-World.☆33Updated last month
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆27Updated 5 months ago