multimodal-interpretability / nnnLinks
Nearest Neighbor Normalization (EMNLP 2024)
☆19Updated last year
Alternatives and similar repositories for nnn
Users that are interested in nnn are comparing it to the libraries listed below
Sorting:
- A Comprehensive Benchmark for Robust Multi-image Understanding☆17Updated last year
- ☆46Updated last year
- CLIP-MoE: Mixture of Experts for CLIP☆55Updated last year
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆18Updated 5 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Updated last year
- Preference Learning for LLaVA☆59Updated last year
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆39Updated 5 months ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 3 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆52Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆69Updated 9 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆68Updated last year
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆88Updated 4 months ago
- ☆24Updated 2 years ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆50Updated 7 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆62Updated last year
- ☆71Updated last year
- Implementation of our paper, Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination..☆20Updated 2 years ago
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆50Updated last year
- [ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models☆86Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆31Updated last year
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Updated last year
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Updated 4 months ago
- Official Repository of Personalized Visual Instruct Tuning☆34Updated 11 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆22Updated last year
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Updated 7 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Updated 7 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆44Updated 9 months ago