zer0int / CLIP-text-image-interpretabilityLinks
Get CLIP ViT text tokens about an image, visualize attention as a heatmap.
☆15Updated 2 years ago
Alternatives and similar repositories for CLIP-text-image-interpretability
Users that are interested in CLIP-text-image-interpretability are comparing it to the libraries listed below
Sorting:
- LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning☆195Updated last year
- Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)☆34Updated last year
- ☆61Updated 2 years ago
- [IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation☆38Updated last year
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆153Updated last year
- ☆100Updated last year
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆25Updated 3 weeks ago
- This is Pytorch Implementation Code for adding new features in code of Segment-Anything. Here, the features support batch-input on the fu…☆166Updated 2 years ago
- Visual Prompt Augmentation☆37Updated 2 years ago
- [CVPR'24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloadin…☆231Updated last year
- ☆61Updated last year
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆336Updated last year
- Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…☆36Updated last week
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆136Updated last year
- Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…☆55Updated 5 months ago
- Awsome works based on SSM and Mamba☆17Updated last year
- Image Instance Segmentation - Zero Shot - OpenAI's CLIP + Meta's SAM☆74Updated 2 years ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆79Updated last month
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆93Updated last year
- [CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval☆35Updated 4 months ago
- CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation☆79Updated last year
- Frontiers in Intelligent Colonoscopy [ColonSurvey | ColonINST | ColonGPT]☆101Updated last month
- [AAAI 2024] TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training☆106Updated 2 years ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆190Updated 2 weeks ago
- This is an official implementation for [ICLR'24] INTR: Interpretable Transformer for Fine-grained Image Classification.☆57Updated last year
- Code for the paper "Explain Any Concept: Segment Anything Meets Concept-Based Explanation". Poster @ NeurIPS 2023☆46Updated 2 years ago
- [ICCV 2023] This is the Pytorch code for our paper "Self-Supervised Cross-View Representation Reconstruction for Change Captioning".☆20Updated 4 months ago
- Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders☆131Updated 9 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆44Updated 9 months ago
- Plotting heatmaps with the self-attention of the [CLS] tokens in the last layer.☆50Updated 3 years ago