MMMU-Benchmark / MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
☆323Updated last week
Related projects: ⓘ
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆252Updated 3 weeks ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆303Updated 2 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning"☆157Updated last week
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆219Updated this week
- Long Context Transfer from Language to Vision☆293Updated 3 weeks ago
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆220Updated 6 months ago
- The model, data and code for the visual GUI Agent SeeClick☆182Updated 3 weeks ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness☆200Updated last week
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆244Updated 6 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆142Updated 2 weeks ago
- ☆239Updated 10 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆218Updated last week
- [ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Mo…☆207Updated last month
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation☆378Updated 5 months ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆237Updated 2 months ago
- Aligning LMMs with Factually Augmented RLHF☆302Updated 10 months ago
- LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images☆298Updated last month
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imag…☆441Updated 4 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆321Updated 9 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆297Updated 5 months ago
- ☆185Updated last month
- Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference☆240Updated last month
- HPT - Open Multimodal LLMs from HyperGAI☆309Updated 3 months ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆464Updated 3 months ago
- ☆111Updated 3 months ago
- 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.☆122Updated this week
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆290Updated 5 months ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆148Updated 2 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆405Updated 4 months ago
- ControlLLM: Augment Language Models with Tools by Searching on Graphs☆183Updated 2 months ago