zer0int / CLIP-text-image-interpretability

Get CLIP ViT text tokens about an image, visualize attention as a heatmap.

☆10

Alternatives and similar repositories for CLIP-text-image-interpretability:

Users that are interested in CLIP-text-image-interpretability are comparing it to the libraries listed below

MCR-PEFT / Ex-MCR
☆41Updated last year
lambert-x / ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆54Updated 6 months ago
TencentARC / ViSFT
☆34Updated 11 months ago
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆45Updated last month
yeruoforever / Awesome-Mamba
Awsome works based on SSM and Mamba
☆17Updated 9 months ago
taco-group / FaceLock
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
☆22Updated last month
Hao840 / ADEM-VL
PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"
☆18Updated 2 months ago
kahnchana / mvu
Multimodal Video Understanding Framework (MVU)
☆26Updated 8 months ago
samar-khanna / ExPLoRA
Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"
☆28Updated 3 months ago
microsoft / A-CLIP
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆26Updated 7 months ago
kyegomez / LIMoE
Implementation of the "the first large-scale multimodal mixture of experts models." from the paper: "Multimodal Contrastive Learning with…
☆25Updated 2 months ago
Huage001 / Paint-Anything
An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.
☆34Updated last year
zer0int / CLIP-XAI-GUI
CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT & ResNet models
☆17Updated 4 months ago
ZechengLi19 / CIM
[IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation
☆37Updated 11 months ago
LaVi-Lab / Visual-Table
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆14Updated 3 months ago
kyegomez / TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆15Updated 2 months ago
markywg / transagent
[NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
☆21Updated 3 months ago
Hoar012 / RAP-MLLM
Retrieval-Augmented Personalization
☆12Updated last month
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆45Updated 2 months ago
OpenGVLab / Multitask-Model-Selector
[NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
☆35Updated 10 months ago
kyegomez / BRAVE-ViT-Swarm
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"
☆22Updated this week
GewelsJI / MVLT
Masked Vision-Language Transformer in Fashion
☆33Updated last year
shulin16 / MMInA
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
☆40Updated 9 months ago
hananshafi / llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
☆70Updated 8 months ago
sunsmarterjie / ChatterBox
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆50Updated last month
awilliamson10 / clipora
Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).
☆19Updated 5 months ago
renytek13 / Soft-Prompt-Generation
[ECCV 2024] Soft Prompt Generation for Domain Generalization
☆17Updated 3 months ago
berkeley-hipie / segllm
Code release for "SegLLM: Multi-round Reasoning Segmentation"
☆56Updated last week