visual-haystacks / vhs_benchmark
π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
β26Updated last week
Alternatives and similar repositories for vhs_benchmark:
Users that are interested in vhs_benchmark are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.β63Updated 8 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ79Updated 9 months ago
- [NAACL 2025] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Modelsβ38Updated this week
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β116Updated 7 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ73Updated 5 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β41Updated last week
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β41Updated 4 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation"β23Updated last month
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scaleβ32Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ59Updated 7 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β44Updated last year
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β40Updated 11 months ago
- Preference Learning for LLaVAβ37Updated 3 months ago
- A instruction data generation system for multimodal language models.β31Updated 3 weeks ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.β64Updated 2 months ago
- β54Updated 10 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β35Updated 6 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''β78Updated 10 months ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Modelsβ43Updated 8 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)β59Updated 3 weeks ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"β54Updated 5 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ61Updated 5 months ago
- Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"β33Updated 3 months ago
- β61Updated 7 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"β19Updated last year
- β47Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"β49Updated 4 months ago
- β28Updated 3 months ago
- β61Updated last month
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β54Updated last year