Manchery / awesome-visual-tokenizerLinks
[WIPπ§] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star π if you find it useful.
β20Updated last year
Alternatives and similar repositories for awesome-visual-tokenizer
Users that are interested in awesome-visual-tokenizer are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ205Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ41Updated 11 months ago
- β115Updated 2 months ago
- [ECCV 2024] Official pytorch implementation of "Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts"β47Updated last year
- β80Updated 7 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Modelsβ94Updated last year
- [ICLR 2026] Generative Universal Verifier as Multimodal Meta-Reasonerβ44Updated 2 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ74Updated 4 months ago
- β58Updated 2 years ago
- Official repository for LLaVA-Reward (ICCV 2025): Multimodal LLMs as Customized Reward Models for Text-to-Image Generationβ22Updated 6 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Modelβ114Updated 6 months ago
- A framework that allows you to apply Sparse AutoEncoder on any modelsβ50Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ117Updated 3 months ago
- The code repository of UniRLβ51Updated 8 months ago
- [ICLR 2026] Autoregressive Image Generation with Randomized Parallel Decodingβ85Updated this week
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficientβ108Updated 4 months ago
- Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).β40Updated last year
- Official respository for ReasonGen-R1β74Updated 7 months ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"β198Updated 7 months ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Graftingβ70Updated 3 weeks ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ51Updated 7 months ago
- Codebase for the paper-Elucidating the design space of language models for image generationβ46Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.β31Updated 5 months ago
- [CVPR 2024 Highlight] ImageNet-Dβ46Updated last year
- Official repository for the UAE paper, unified-GRPO, and unified-Benchβ156Updated 4 months ago
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editingβ30Updated last month
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesisβ130Updated 8 months ago
- [CVPR 2025] HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generationβ61Updated 6 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrβ¦β79Updated last year
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understandingβ37Updated 10 months ago