google-research/vision_transformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research/vision_transformer)

google-research / vision_transformer

☆12,626

Alternatives and similar repositories for vision_transformer

Users that are interested in vision_transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / vit-pytorch
View on GitHub
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Py…
☆25,413Jun 22, 2026Updated 3 weeks ago
microsoft / Swin-Transformer
View on GitHub
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
☆15,996Jul 24, 2024Updated last year
huggingface / pytorch-image-models
View on GitHub
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights --…
☆36,986Updated this week
facebookresearch / deit
View on GitHub
Official DeiT repository
☆4,357Mar 15, 2024Updated 2 years ago
facebookresearch / detr
View on GitHub
End-to-End Object Detection with Transformers
☆15,336Mar 12, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
openai / CLIP
View on GitHub
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
☆33,994Mar 25, 2026Updated 3 months ago
facebookresearch / mae
View on GitHub
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
☆8,364Jul 23, 2024Updated last year
jeonsworld / ViT-pytorch
View on GitHub
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
☆2,157Jun 7, 2022Updated 4 years ago
facebookresearch / dino
View on GitHub
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
☆7,600Jul 3, 2024Updated 2 years ago
facebookresearch / ConvNeXt
View on GitHub
Code release for ConvNeXt model
☆6,413Jan 8, 2023Updated 3 years ago
facebookresearch / moco
View on GitHub
PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722
☆5,139Feb 3, 2026Updated 5 months ago
dk-liang / Awesome-Visual-Transformer
View on GitHub
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
☆3,587Jan 7, 2025Updated last year
facebookresearch / segment-anything
View on GitHub
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…
☆54,550Sep 18, 2024Updated last year
mlfoundations / open_clip
View on GitHub
An open source implementation of CLIP.
☆13,986Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
facebookresearch / detectron2
View on GitHub
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
☆34,599Jun 7, 2026Updated last month
fundamentalvision / Deformable-DETR
View on GitHub
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
☆3,996May 16, 2024Updated 2 years ago
open-mmlab / mmdetection
View on GitHub
OpenMMLab Detection Toolbox and Benchmark
☆32,813Aug 21, 2024Updated last year
google-research / simclr
View on GitHub
SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
☆4,506May 22, 2023Updated 3 years ago
jacobgil / pytorch-grad-cam
View on GitHub
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, I…
☆12,913Jul 10, 2026Updated last week
huggingface / transformers
View on GitHub
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…
☆162,626Updated this week
amusi / CVPR2026-Papers-with-Code
View on GitHub
CVPR 2026 论文和开源项目合集
☆22,748Mar 8, 2026Updated 4 months ago
google-research / google-research
View on GitHub
Google Research
☆38,381Updated this week
yitu-opensource / T2T-ViT
View on GitHub
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
☆1,193Oct 27, 2023Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,491May 19, 2025Updated last year
facebookresearch / dinov2
View on GitHub
PyTorch code and models for the DINOv2 self-supervised learning method.
☆13,109Jun 3, 2026Updated last month
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,159Jan 23, 2026Updated 5 months ago
pytorch / vision
View on GitHub
Datasets, Transforms and Models specific to Computer Vision
☆17,813Updated this week
whai362 / PVT
View on GitHub
Official implementation of PVT series
☆1,901Oct 27, 2022Updated 3 years ago
open-mmlab / mmsegmentation
View on GitHub
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
☆9,877Aug 13, 2024Updated last year
google-research / scenic
View on GitHub
Scenic: A Jax Library for Computer Vision Research and Beyond
☆3,818Jul 9, 2026Updated last week
facebookresearch / vissl
View on GitHub
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
☆3,294Mar 3, 2024Updated 2 years ago
CompVis / latent-diffusion
View on GitHub
High-Resolution Image Synthesis with Latent Diffusion Models
☆14,104Feb 29, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
facebookresearch / SlowFast
View on GitHub
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
☆7,391Mar 16, 2026Updated 4 months ago
salesforce / LAVIS
View on GitHub
LAVIS - A One-stop Library for Language-Vision Intelligence
☆11,250Jun 2, 2026Updated last month
NVIDIA / apex
View on GitHub
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
☆8,979Updated this week
CompVis / taming-transformers
View on GitHub
Taming Transformers for High-Resolution Image Synthesis
☆6,519Jul 30, 2024Updated last year
facebookresearch / DiT
View on GitHub
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆8,674May 31, 2024Updated 2 years ago
openai / guided-diffusion
View on GitHub
☆7,404Jul 2, 2024Updated 2 years ago
haotian-liu / LLaVA
View on GitHub
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
☆24,923Aug 12, 2024Updated last year