visual-haystacks / vhs_benchmark
π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
β26Updated last month
Alternatives and similar repositories for vhs_benchmark:
Users that are interested in vhs_benchmark are comparing it to the libraries listed below
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.β65Updated 9 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Modelsβ40Updated 2 weeks ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ74Updated 6 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β41Updated 3 weeks ago
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Modelsβ43Updated 9 months ago
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β42Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]β60Updated 2 weeks ago
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β12Updated last month
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β58Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ59Updated 8 months ago
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)β28Updated 5 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"β50Updated 5 months ago
- β37Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"β54Updated 7 months ago
- β54Updated 11 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ80Updated 10 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputsβ18Updated 5 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.β70Updated 3 months ago
- β49Updated last year
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β115Updated 8 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ33Updated this week
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scaleβ35Updated 3 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202β¦β25Updated last week
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β35Updated 7 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β44Updated last year
- Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"β13Updated 5 months ago
- Official implementation of the paper The Hidden Language of Diffusion Modelsβ72Updated last year
- Preference Learning for LLaVAβ40Updated 4 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."β42Updated 5 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Modelsβ70Updated 9 months ago