visual-haystacks / vhs_benchmark
π₯ Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
β21Updated last month
Related projects β
Alternatives and complementary repositories for vhs_benchmark
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ70Updated 2 months ago
- SIEVE: Multimodal Dataset Pruning using Image-Captioning Models (CVPR 2024)β14Updated 6 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β43Updated 11 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.orβ¦β107Updated 4 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)β52Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"β33Updated 3 months ago
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ47Updated 4 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"β38Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effectβ¦β32Updated 5 months ago
- Official implementation of the paper The Hidden Language of Diffusion Modelsβ69Updated 9 months ago
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β34Updated 8 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".β42Updated 3 weeks ago
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generationβ49Updated 7 months ago
- β45Updated last year
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enlaβ¦β45Updated last month
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image β¦β55Updated last month
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.β30Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuningβ24Updated 2 weeks ago
- Matryoshka Multimodal Modelsβ82Updated this week
- Data-Efficient Multimodal Fusion on a Single GPUβ47Updated 6 months ago
- Code for T-MARS data filteringβ35Updated last year
- β55Updated 6 months ago
- β48Updated last year
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Dataβ32Updated 8 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''β73Updated 7 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ52Updated 2 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"β28Updated 2 weeks ago
- β64Updated 4 months ago
- Recursive Visual Programmingβ16Updated this week
- SMILE: A Multimodal Dataset for Understanding Laughterβ13Updated last year