multimodal-interpretability / nnn
Nearest Neighbor Normalization (EMNLP 2024)
☆18Updated 3 months ago
Alternatives and similar repositories for nnn:
Users that are interested in nnn are comparing it to the libraries listed below
- Code for paper: Unified Text-to-Image Generation and Retrieval☆13Updated 7 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 2 months ago
- ☆38Updated 3 months ago
- visual question answering prompting recipes for large vision-language models☆24Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆34Updated 2 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆26Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆38Updated last year
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆26Updated 3 months ago
- Preference Learning for LLaVA☆37Updated 3 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- Implementation of our paper, Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination..☆17Updated last year
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆39Updated 11 months ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆37Updated 5 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆21Updated 6 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆40Updated 11 months ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆18Updated 8 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆50Updated 7 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆57Updated 3 months ago
- The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…☆29Updated last year
- Implementation for the paper "Reliable Visual Question Answering Abstain Rather Than Answer Incorrectly" (ECCV 2022: https//arxiv.org/abs…☆33Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 weeks ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆41Updated 4 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆79Updated 9 months ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆76Updated last year
- Matryoshka Multimodal Models☆97Updated last month
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆49Updated 4 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆49Updated 8 months ago
- Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)☆57Updated 8 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated last month
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆26Updated 2 months ago