facebookresearch / ToMeLinks

A method to increase the speed and lower the memory footprint of existing vision transformers.

☆1,110

Alternatives and similar repositories for ToMe

Users that are interested in ToMe are comparing it to the libraries listed below

Sorting:

google-research / pix2seq
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
☆930Updated last year
facebookresearch / hiera
Hiera: A fast, powerful, and simple hierarchical vision transformer.
☆1,026Updated last year
NVlabs / FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
☆880Updated 3 months ago
NVlabs / GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
☆773Updated 3 years ago
snap-research / EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
☆1,081Updated 2 years ago
google-research / vmoe
☆682Updated 2 months ago
SHI-Labs / Neighborhood-Attention-Transformer
Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
☆1,146Updated last year
sail-sg / metaformer
MetaFormer Baselines for Vision (TPAMI 2024)
☆492Updated last year
microsoft / FocalNet
[NeurIPS 2022] Official code for "Focal Modulation Networks"
☆741Updated last year
czczup / ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
☆1,422Updated 4 months ago
LAION-AI / CLIP_benchmark
CLIP-like model evaluation
☆779Updated 2 months ago
bytedance / ibot
iBOT : Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
☆748Updated 3 years ago
mlfoundations / wise-ft
Robust fine-tuning of zero-shot models
☆744Updated 3 years ago
NVlabs / GCVit
[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers
☆440Updated last year
lucidrains / CoCa-pytorch
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
☆1,181Updated last year
sail-sg / Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
☆799Updated 4 months ago
microsoft / X-Decoder
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
☆1,334Updated 2 years ago
lucidrains / x-clip
A concise but complete implementation of CLIP with various experimental improvements from recent papers
☆716Updated 2 years ago
facebookresearch / flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
☆428Updated 2 years ago
NVlabs / ODISE
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
☆928Updated last year
sail-sg / poolformer
PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
☆1,355Updated last year
ma-xu / Context-Cluster
[ICLR 2023 Oral] Image as Set of Points
☆571Updated last year
baaivision / EVA
EVA Series: Visual Representation Fantasies from BAAI
☆2,584Updated last year
google-research / maxvit
[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmen…
☆487Updated 2 years ago
facebookresearch / ConvNeXt-V2
Code release for ConvNeXt V2 model
☆1,853Updated last year
yzhuoning / Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
☆1,218Updated last year
raoyongming / DynamicViT
[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
☆630Updated 2 years ago
LTH14 / mage
A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
☆569Updated 2 years ago
microsoft / Cream
This is a collection of our NAS and Vision Transformer work.
☆1,806Updated last year
allenai / visprog
Official code for VisProg (CVPR 2023 Best Paper!)
☆748Updated last year