naver-ai / rope-vit
[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
☆270Updated last month
Alternatives and similar repositories for rope-vit:
Users that are interested in rope-vit are comparing it to the libraries listed below
- Open source implementation of "Vision Transformers Need Registers"☆161Updated this week
- unofficial MaskGIT reproduction in PyTorch☆182Updated 11 months ago
- This repo contains the code for 1D tokenizer and generator☆667Updated this week
- When do we not need larger vision models?☆357Updated last month
- MoVQGAN - model for the image encoding and reconstruction☆215Updated last year
- My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"☆210Updated this week
- This is the official code release for our work, Denoising Vision Transformers.☆352Updated 2 months ago
- Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)☆544Updated 9 months ago
- Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch☆291Updated 2 weeks ago
- [NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"☆178Updated 4 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆78Updated 10 months ago
- XQ-GAN🚀: An Open-source Image Tokenization Framework for Autoregressive Generation☆182Updated last week
- Code for Fast Training of Diffusion Models with Masked Transformers☆386Updated 8 months ago
- Scaling Diffusion Transformers with Mixture of Experts☆245Updated 4 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆257Updated 9 months ago
- Official Pytorch Implementation of Our CVPR2023 Paper: "Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dyna…☆166Updated last year
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆212Updated last week
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆217Updated 5 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆816Updated last week
- Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.☆230Updated 3 months ago
- The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"☆194Updated 9 months ago
- ☆112Updated 7 months ago
- Implementation of Autoregressive Diffusion in Pytorch☆348Updated 2 months ago
- [ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model☆395Updated 2 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆170Updated last month
- Neighborhood Attention Extension. Bringing attention to a neighborhood near you!☆396Updated 3 weeks ago
- Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures☆398Updated 2 months ago
- VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks☆376Updated 6 months ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆166Updated 7 months ago
- [arXiv'25] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models☆214Updated 2 weeks ago