Manchery / awesome-visual-tokenizerLinks
[WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful.
☆14Updated 6 months ago
Alternatives and similar repositories for awesome-visual-tokenizer
Users that are interested in awesome-visual-tokenizer are comparing it to the libraries listed below
Sorting:
- ☆22Updated 3 weeks ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆37Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆31Updated 4 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆102Updated last week
- [ICLR 2025] Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching☆47Updated 2 months ago
- ☆64Updated 3 weeks ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆114Updated 8 months ago
- A collection of vision foundation models unifying understanding and generation.☆57Updated 6 months ago
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆38Updated 4 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆45Updated 2 weeks ago
- official code repo of CVPR 2025 paper PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation☆38Updated 4 months ago
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆50Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆18Updated 4 months ago
- ☆50Updated 7 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆59Updated 2 months ago
- ☆30Updated 7 months ago
- ☆12Updated 6 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆69Updated 3 months ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆93Updated last month
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆83Updated 10 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆74Updated this week
- (ICCV 2025) "Principal Components" Enable A New Language of Images☆50Updated last month
- ☆17Updated 7 months ago
- the official repo for "D-AR: Diffusion via Autoregressive Models"☆106Updated 3 weeks ago
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆51Updated 3 weeks ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆45Updated last year
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis☆113Updated 2 months ago
- ☆41Updated last year
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆126Updated 6 months ago