Manchery / awesome-visual-tokenizerLinks
[WIPπ§] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star π if you find it useful.
β13Updated 5 months ago
Alternatives and similar repositories for awesome-visual-tokenizer
Users that are interested in awesome-visual-tokenizer are comparing it to the libraries listed below
Sorting:
- Official Repository of Personalized Visual Instruct Tuningβ29Updated 3 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoningβ15Updated 2 months ago
- Codebase for the paper-Elucidating the design space of language models for image generationβ45Updated 7 months ago
- Autoregressive Image Generation with Randomized Parallel Decodingβ67Updated 2 months ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"β43Updated 11 months ago
- β37Updated last month
- The official repo of continuous speculative decodingβ27Updated 3 months ago
- Official implementation of LaVin-DiTβ34Updated 5 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).β38Updated last year
- Training code for CLIP-FlanT5β26Updated 10 months ago
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Modelsβ51Updated 3 weeks ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token promptβ¦β30Updated 8 months ago
- β11Updated last month
- [CVPR 2024 Highlight] ImageNet-Dβ43Updated 8 months ago
- Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"β33Updated 4 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ35Updated 4 months ago
- β41Updated 11 months ago
- β11Updated 5 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"β39Updated 3 weeks ago
- β21Updated 7 months ago
- Official Implementation for "Editing Massive Concepts in Text-to-Image Diffusion Models"β19Updated last year
- [ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decodingβ39Updated 2 months ago
- β44Updated 5 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selectorβ37Updated last year
- Minimal multi-gpu implementation of EDM2: "Analyzing and Improving the Training Dynamics of Diffusion Models"β32Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]β18Updated 4 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".β27Updated last year
- β12Updated 2 months ago
- β37Updated last month
- fixed official code for paper "A Closer Look at Parameter-Efficient Tuning in Diffusion Models".β41Updated 2 years ago