hila-chefer / Transformer-MM-ExplainabilityLinks

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

☆870

Alternatives and similar repositories for Transformer-MM-Explainability

Users that are interested in Transformer-MM-Explainability are comparing it to the libraries listed below

Sorting:

hila-chefer / Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize …
☆1,945Updated last year
jacobgil / vit-explain
Explainability for Vision Transformers
☆1,010Updated 3 years ago
Zasder3 / train-CLIP
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
☆713Updated 3 years ago
Sense-GVT / DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
☆666Updated 3 years ago
mlfoundations / wise-ft
Robust fine-tuning of zero-shot models
☆744Updated 3 years ago
lucidrains / CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
☆1,181Updated last year
NVlabs / GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
☆773Updated 3 years ago
bytedance / ibot
iBOT : Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
☆748Updated 3 years ago
facebookresearch / moco-v3
PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057
☆1,298Updated 3 years ago
ashkamath / mdetr
☆1,033Updated 3 years ago
facebookresearch / SLIP
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆782Updated 2 years ago
lucidrains / x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
☆716Updated 2 years ago
KMnP / vpt
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
☆1,173Updated 2 years ago
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,218Updated last year
xxxnell / how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
☆819Updated 3 years ago
microsoft / SimMIM
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
☆998Updated 3 years ago
microsoft / UniCL
[CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"
☆403Updated last year
samiraabnar / attention_flow
☆254Updated 4 years ago
salesforce / ALBEF
Code for ALBEF: a new vision-language pre-training method
☆1,718Updated 3 years ago
microsoft / esvit
EsViT: Efficient self-supervised Vision Transformers
☆412Updated 2 years ago
KaiyangZhou / CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
☆2,087Updated last year
facebookresearch / msn
Masked Siamese Networks for Label-Efficient Learning (https://arxiv.org/abs/2204.07141)
☆460Updated 3 years ago
lukemelas / PyTorch-Pretrained-ViT
Vision Transformer (ViT) in PyTorch
☆843Updated 3 years ago
DirtyHarryLYL / Transformer-in-Vision
Recent Transformer-based CV and related works.
☆1,333Updated 2 years ago
EPFL-VILAB / MultiMAE
MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022
☆597Updated 2 years ago
Alibaba-MIIL / ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
☆774Updated 2 years ago
lucidrains / flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
☆1,265Updated 3 years ago
dandelin / ViLT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
☆1,499Updated last year
locuslab / convmixer
Implementation of ConvMixer for "Patches Are All You Need? 🤷"
☆1,077Updated 2 years ago
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆779Updated 2 months ago