DirtyHarryLYL/Transformer-in-Vision

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DirtyHarryLYL/Transformer-in-Vision)

DirtyHarryLYL / Transformer-in-Vision

Recent Transformer-based CV and related works.

☆1,339

Alternatives and similar repositories for Transformer-in-Vision

Users that are interested in Transformer-in-Vision are comparing it to the libraries listed below

Sorting:

dk-liang / Awesome-Visual-Transformer
View on GitHub
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
☆3,565Jan 7, 2025Updated last year
Yangzhangcst / Transformer-in-Computer-Vision
View on GitHub
A paper list of some recent Transformer-based CV works.
☆1,431Nov 19, 2025Updated 3 months ago
cmhungsteve / Awesome-Transformer-Attention
View on GitHub
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
☆5,016Jul 30, 2024Updated last year
facebookresearch / deit
View on GitHub
Official DeiT repository
☆4,326Mar 15, 2024Updated last year
DirtyHarryLYL / LLM-in-Vision
View on GitHub
Recent LLM-based CV and related works. Welcome to comment/contribute!
☆874Mar 8, 2025Updated last year
yuewang-cuhk / awesome-vision-language-pretraining-papers
View on GitHub
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
☆1,155Aug 19, 2022Updated 3 years ago
microsoft / Swin-Transformer
View on GitHub
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
☆15,721Jul 24, 2024Updated last year
huggingface / pytorch-image-models
View on GitHub
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…
☆36,420Feb 26, 2026Updated last week
ashkamath / mdetr
View on GitHub
☆1,047Oct 3, 2022Updated 3 years ago
facebookresearch / ConvNeXt
View on GitHub
Code release for ConvNeXt model
☆6,302Jan 8, 2023Updated 3 years ago
yitu-opensource / T2T-ViT
View on GitHub
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
☆1,192Oct 27, 2023Updated 2 years ago
IDEA-Research / awesome-detection-transformer
View on GitHub
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
☆1,397Jul 4, 2024Updated last year
lijiaman / awesome-transformer-for-vision
View on GitHub
☆280Mar 22, 2021Updated 4 years ago
microsoft / SimMIM
View on GitHub
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".
☆1,026Sep 29, 2022Updated 3 years ago
DirtyHarryLYL / HOI-Learning-List
View on GitHub
A list of Human-Object Interaction Learning.
☆705Oct 24, 2025Updated 4 months ago
whai362 / PVT
View on GitHub
Official implementation of PVT series
☆1,888Oct 27, 2022Updated 3 years ago
facebookresearch / dino
View on GitHub
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
☆7,459Jul 3, 2024Updated last year
ucasligang / awesome-MIM
View on GitHub
Reading list for research topics in Masked Image Modeling
☆338Dec 3, 2024Updated last year
hila-chefer / Transformer-Explainability
View on GitHub
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize …
☆1,976Jan 24, 2024Updated 2 years ago
jayleicn / ClipBERT
View on GitHub
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning…
☆729Aug 8, 2023Updated 2 years ago
facebookresearch / SLIP
View on GitHub
Code release for SLIP Self-supervision meets Language-Image Pre-training
☆787Feb 9, 2023Updated 3 years ago
facebookresearch / mae
View on GitHub
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
☆8,230Jul 23, 2024Updated last year
jason718 / awesome-self-supervised-learning
View on GitHub
A curated list of awesome self-supervised methods
☆6,363Feb 24, 2026Updated last week
Sense-GVT / DeCLIP
View on GitHub
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
☆675Sep 19, 2022Updated 3 years ago
HobbitLong / PyContrast
View on GitHub
PyTorch implementation of Contrastive Learning methods
☆1,995Oct 4, 2023Updated 2 years ago
microsoft / GLIP
View on GitHub
Grounded Language-Image Pre-training
☆2,575Jan 24, 2024Updated 2 years ago
zdou0830 / METER
View on GitHub
METER: A Multimodal End-to-end TransformER Framework
☆376Nov 16, 2022Updated 3 years ago
ttengwang / Awesome_Prompting_Papers_in_Computer_Vision
View on GitHub
A curated list of prompt-based paper in computer vision and vision-language learning.
☆925Dec 18, 2023Updated 2 years ago
yzhuoning / Awesome-CLIP
View on GitHub
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,232Jun 28, 2024Updated last year
sail-sg / poolformer
View on GitHub
PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
☆1,367Jun 1, 2024Updated last year
pengzhiliang / MAE-pytorch
View on GitHub
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
☆2,687Jul 25, 2023Updated 2 years ago
google-research / scenic
View on GitHub
Scenic: A Jax Library for Computer Vision Research and Beyond
☆3,772Mar 2, 2026Updated last week
google-research / vision_transformer
View on GitHub
☆12,332Updated this week
pliang279 / awesome-multimodal-ml
View on GitHub
Reading list for research topics in multimodal machine learning
☆6,824Aug 20, 2024Updated last year
raoyongming / DenseCLIP
View on GitHub
[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
☆545Sep 15, 2023Updated 2 years ago
zihangJiang / TokenLabeling
View on GitHub
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
☆433Sep 5, 2023Updated 2 years ago
diff-usion / Awesome-Diffusion-Models
View on GitHub
A collection of resources and papers on Diffusion Models
☆12,273Aug 1, 2024Updated last year
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆209Dec 18, 2022Updated 3 years ago
clip-vil / CLIP-ViL
View on GitHub
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆420Oct 28, 2022Updated 3 years ago