Pytorch version of Vision Transformer (ViT) with pretrained models. This is part of CASL (https://casl-project.github.io/) and ASYML project.
☆360Nov 23, 2020Updated 5 years ago
Alternatives and similar repositories for vision-transformer-pytorch
Users that are interested in vision-transformer-pytorch are comparing it to the libraries listed below
Sorting:
- Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)☆2,123Jun 7, 2022Updated 3 years ago
- ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet☆1,192Oct 27, 2023Updated 2 years ago
- Vision Transformer (ViT) in PyTorch☆847Mar 2, 2022Updated 4 years ago
- Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"☆13Jun 26, 2021Updated 4 years ago
- Official DeiT repository☆4,325Mar 15, 2024Updated last year
- ☆12,332Updated this week
- This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".☆1,026Sep 29, 2022Updated 3 years ago
- Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)☆3,565Jan 7, 2025Updated last year
- This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".☆15,721Jul 24, 2024Updated last year
- [CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting☆544Sep 15, 2023Updated 2 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆89Jun 12, 2023Updated 2 years ago
- PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057☆1,319Nov 25, 2021Updated 4 years ago
- End-to-End Object Detection with Fully Convolutional Network☆495Jan 10, 2022Updated 4 years ago
- Deep Learning for Video Retrieval by Natural Language☆11Oct 20, 2019Updated 6 years ago
- Code release for SLIP Self-supervision meets Language-Image Pre-training☆787Feb 9, 2023Updated 3 years ago
- Official Codes and Pretrained Models for Dynamic MLP, CVPR2022, https://arxiv.org/abs/2203.03253☆88Mar 8, 2022Updated 3 years ago
- Self-supervised vIsion Transformer (SiT)☆337Dec 24, 2022Updated 3 years ago
- ☆23Oct 29, 2020Updated 5 years ago
- Two simple and effective designs of vision transformer, which is on par with the Swin transformer☆608Feb 14, 2023Updated 3 years ago
- [CVPR 2021 & IJCV 2024] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers☆1,110Sep 2, 2024Updated last year
- Deformable DETR: Deformable Transformers for End-to-End Object Detection.☆3,901May 16, 2024Updated last year
- This is an official implementation for "Contextual Transformer Networks for Visual Recognition".☆539Aug 8, 2021Updated 4 years ago
- [CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal☆1,348Apr 30, 2023Updated 2 years ago
- Implementation of the Swin Transformer in PyTorch.☆857Mar 29, 2021Updated 4 years ago
- [CVPR2021] De-rendering the World's Revolutionary Artefacts☆58Feb 1, 2023Updated 3 years ago
- Accelerating T2t-ViT by 1.6-3.6x.☆258Nov 25, 2021Updated 4 years ago
- CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval☆129Feb 26, 2020Updated 6 years ago
- The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…☆36,420Feb 26, 2026Updated last week
- OpenMMLab Self-Supervised Learning Toolbox and Benchmark☆3,296Jun 25, 2023Updated 2 years ago
- Recent Transformer-based CV and related works.☆1,340Aug 22, 2023Updated 2 years ago
- Implementation of the "Deep Hierarchical Representation of Point Cloud Videos via Spatio-Temporal Decomposition" paper.☆21Apr 13, 2021Updated 4 years ago
- 陆续开源医疗行业的深度学习模型及数据集☆13Dec 30, 2021Updated 4 years ago
- ☆13Nov 7, 2021Updated 4 years ago
- PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722☆5,116Feb 3, 2026Updated last month
- PyTorch implementation of MAE https//arxiv.org/abs/2111.06377☆8,230Jul 23, 2024Updated last year
- Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners☆2,687Jul 25, 2023Updated 2 years ago
- This is an official implementation for "Self-Supervised Learning with Swin Transformers".☆667May 13, 2021Updated 4 years ago
- Introduction and scripts for the paper "PartImageNet: A Large, High-Quality Dataset of Parts" (Ju He, Shuo Yang, Shaokang Yang, Adam Kort…☆136Mar 20, 2025Updated 11 months ago
- Provided code allows to train HardNet8 (described in the thesis) on multiple datasets of format of Liberty or AMOS.☆26Dec 8, 2022Updated 3 years ago