kyegomez / Vit-RGTSLinks

Open source implementation of "Vision Transformers Need Registers"

☆184

Alternatives and similar repositories for Vit-RGTS

Users that are interested in Vit-RGTS are comparing it to the libraries listed below

Sorting:

TonyLianLong / CrossMAE
Official Implementation of the CrossMAE paper: Rethinking Patch Dependence for Masked Autoencoders
☆115Updated 3 months ago
facebookresearch / webssl
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆168Updated 3 months ago
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆161Updated 9 months ago
naver-ai / rope-vit
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
☆371Updated 7 months ago
wangf3014 / Mamba-Reg
☆70Updated 5 months ago
dfan / webssl
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
☆246Updated 3 months ago
zh460045050 / V2L-Tokenizer
☆135Updated last year
wysoczanska / clip_dinoiser
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
☆251Updated 9 months ago
Jiawei-Yang / Denoising-ViT
This is the official code release for our work, Denoising Vision Transformers.
☆372Updated 8 months ago
yossigandelsman / clip_text_span
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
☆217Updated 2 months ago
WalBouss / GEM
[CVPR24] Official Implementation of GEM (Grounding Everything Module)
☆127Updated 3 months ago
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆89Updated last month
OliverRensu / ARM
[ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
☆83Updated 2 months ago
AILab-CVC / M2PT
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆99Updated last year
all-things-vits / code-samples
Holds code for our CVPR'23 tutorial: All Things ViTs: Understanding and Interpreting Attention in Vision.
☆194Updated 2 years ago
ziqipang / LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆241Updated last year
ChenDelong1999 / subobjects
Official repository of paper "Subobject-level Image Tokenization" (ICML-25)
☆80Updated 3 weeks ago
RobinWu218 / SimDINO
[ICML 2025] Official Implementation for SimDINO/SimDINOv2
☆156Updated 4 months ago
kevin-ssy / CLIP_as_RNN
Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
☆108Updated last year
lxa9867 / ImageFolder
High-performance Image Tokenizers for VAR and AR
☆278Updated 3 months ago
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆134Updated 2 months ago
bfshi / scaling_on_scales
When do we not need larger vision models?
☆404Updated 5 months ago
ViTAE-Transformer / QFormer
The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
☆217Updated last year
mihirp1998 / Diffusion-TTA
Diffusion-TTA improves pre-trained discriminative models such as image classifiers or segmentors using pre-trained generative models.
☆74Updated last year
baaivision / DIVA
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
☆283Updated 6 months ago
erow / FastSSL
☆39Updated 4 months ago
LeapLabTHU / EfficientTrain
1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…
☆222Updated 11 months ago
Beckschen / ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
☆207Updated last year
Malitha123 / awesome-video-self-supervised-learning
A curated list of awesome self-supervised learning methods in videos
☆149Updated 2 weeks ago
PalAvik / hycoclip
Code for the paper "Compositional Entailment Learning for Hyperbolic Vision-Language Models".
☆77Updated last month