Lillianwei-h / MMIE
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
☆22Updated last week
Related projects ⓘ
Alternatives and complementary repositories for MMIE
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 weeks ago
- ☆29Updated 3 weeks ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆45Updated last month
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆17Updated 4 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated last week
- ☆24Updated 9 months ago
- "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆15Updated 4 months ago
- ☆19Updated last month
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆17Updated 3 weeks ago
- ☆29Updated 3 weeks ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"☆29Updated 2 weeks ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- ViLLA: Fine-grained vision-language representation learning from real-world data☆40Updated last year
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆12Updated 2 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- [arXiv'24 & NeurIPSW'24] MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models☆53Updated 3 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated this week
- ☆15Updated this week
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Code for "Merging Text Transformers from Different Initializations"☆19Updated 3 months ago
- Code for T-MARS data filtering☆35Updated last year
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆11Updated last month
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆11Updated 3 months ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆13Updated 9 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆33Updated 2 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆17Updated 3 weeks ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 2 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆27Updated 2 months ago
- ☆30Updated 9 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago