Plotting heatmaps with the self-attention of the [CLS] tokens in the last layer.
☆50May 11, 2022Updated 3 years ago
Alternatives and similar repositories for CLIP-self-attention-visualization
Users that are interested in CLIP-self-attention-visualization are comparing it to the libraries listed below
Sorting:
- The official implementations of Noise-Informed Diffusion-Generated Image Detection With Anomaly Attention (TIFS 2025)☆18Jun 23, 2025Updated 8 months ago
- ☆61Jul 11, 2024Updated last year
- ☆15Mar 19, 2024Updated last year
- Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs☆14Apr 19, 2025Updated 10 months ago
- General-purpose Visual Understanding Evaluation☆20Dec 21, 2023Updated 2 years ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆20Sep 21, 2024Updated last year
- A fast data loader for ImageNet on PyTorch.☆18Mar 17, 2019Updated 6 years ago
- ☆45Oct 5, 2025Updated 5 months ago
- Code of paper [CVPR'24: Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?]☆23Apr 2, 2024Updated last year
- [SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk☆16Mar 15, 2025Updated 11 months ago
- This code is for pose-guided human animation from a single image.☆16Jun 18, 2021Updated 4 years ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆55Aug 16, 2024Updated last year
- An effective image quality assessment framework combining Segment Anything (SAM). This is the official implementation of our paper.☆24Jun 29, 2023Updated 2 years ago
- [CVPR'25 Oral] LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models☆49Aug 28, 2025Updated 6 months ago
- [ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation☆59Aug 27, 2022Updated 3 years ago
- [AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA☆27Jul 12, 2024Updated last year
- ☆23Sep 28, 2023Updated 2 years ago
- [Pattern Recognition 25] CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks☆467Mar 1, 2025Updated last year
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Jun 28, 2021Updated 4 years ago
- [MM 2024 Oral] Refiner for AIGC☆29Jul 29, 2024Updated last year
- Semantic Image Manipulation using Scene Graphs (CVPR 2020)☆60May 1, 2023Updated 2 years ago
- Simple script to compute CLIP-based scores given a DALL-e trained model.☆29Jun 13, 2021Updated 4 years ago
- ☆27Aug 28, 2023Updated 2 years ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆30Dec 31, 2025Updated 2 months ago
- [ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decode…☆903Aug 24, 2023Updated 2 years ago
- ☆13Dec 9, 2020Updated 5 years ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31May 29, 2023Updated 2 years ago
- ☆14May 25, 2021Updated 4 years ago
- Path integral based convolution and pooling☆30Feb 16, 2023Updated 3 years ago
- ☆29Jun 24, 2021Updated 4 years ago
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆166Nov 16, 2025Updated 3 months ago
- ☆661Nov 28, 2023Updated 2 years ago
- ☆33Nov 12, 2018Updated 7 years ago
- [ICLR 2026] ParallelBench: Understanding the Tradeoffs of Parallel Decoding in Diffusion LLMs☆42Updated this week
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated 10 months ago
- Project aimed to segment petrographic images taken from rock samples thin sections, in order to classify rock types.☆12Sep 30, 2021Updated 4 years ago
- ☆46Dec 13, 2023Updated 2 years ago
- 하드코딩으로 아주아주 간단한 챗봇☆10May 25, 2018Updated 7 years ago
- It's a personal blog adopted from cayman-blog☆11Jan 17, 2023Updated 3 years ago