naver-ai / rope-vit
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
☆296Updated 3 months ago
Alternatives and similar repositories for rope-vit:
Users that are interested in rope-vit are comparing it to the libraries listed below
- Open source implementation of "Vision Transformers Need Registers"☆166Updated 2 months ago
- [ICLR2025] Halton Scheduler for Masked Generative Image Transformer☆206Updated last month
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆194Updated 6 months ago
- Code for Fast Training of Diffusion Models with Masked Transformers☆397Updated 10 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆81Updated last year
- This repo contains the code for 1D tokenizer and generator☆804Updated 2 weeks ago
- When do we not need larger vision models?☆383Updated last month
- Implementation of Autoregressive Diffusion in Pytorch☆366Updated 5 months ago
- Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)☆554Updated 11 months ago
- High-performance Image Tokenizers for VAR and AR☆233Updated last week
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆219Updated 7 months ago
- The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"☆202Updated 11 months ago
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generat…☆152Updated last month
- This is the official code release for our work, Denoising Vision Transformers.☆357Updated 4 months ago
- [ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures☆437Updated last month
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆225Updated 2 months ago
- ☆87Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizers☆857Updated last month
- The official implementation of DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis☆186Updated 9 months ago
- A collection of resources and papers on Vector Quantized Variational Autoencoder (VQ-VAE) and its application☆262Updated 2 months ago
- ☆170Updated last month
- Neighborhood Attention Extension. Bringing attention to a neighborhood near you!☆430Updated 2 weeks ago
- Implementation of MagViT2 Tokenizer in Pytorch☆597Updated 2 months ago
- ☆121Updated 9 months ago
- [CVPR 2025] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive…☆238Updated 2 months ago
- MoVQGAN - model for the image encoding and reconstruction☆226Updated last year
- Scaling Diffusion Transformers with Mixture of Experts☆304Updated 6 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆99Updated last year
- A PyTorch implementation of the paper "ZigMa: A DiT-Style Mamba-based Diffusion Model" (ECCV 2024)☆301Updated 2 weeks ago
- Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch☆335Updated 2 months ago