FoundationVision / UniTokLinks
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
β491Updated last month
Alternatives and similar repositories for UniTok
Users that are interested in UniTok are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β412Updated 4 months ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"β422Updated 6 months ago
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learningβ270Updated 8 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ607Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuningβ228Updated 8 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β297Updated 2 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Thinkβ651Updated last week
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ234Updated 6 months ago
- β167Updated 5 months ago
- [ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformersβ421Updated 2 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ182Updated 7 months ago
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"β284Updated 8 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifieβ¦β337Updated this week
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"β195Updated 2 weeks ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ411Updated 8 months ago
- [CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Moβ¦β324Updated 6 months ago
- Towards Scalable Pre-training of Visual Tokenizers for Generationβ357Updated last week
- This is a repo to track the latest autoregressive visual generation papers.β420Updated 6 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"β202Updated 5 months ago
- High-performance Image Tokenizers for VAR and ARβ300Updated 8 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ173Updated last month
- DDT: Decoupled Diffusion Transformerβ343Updated 4 months ago
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generatβ¦β241Updated 2 months ago
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-projectβ184Updated 9 months ago
- Structured Video Comprehension of Real-World Shortsβ227Updated 3 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueriesβ281Updated 2 months ago
- β579Updated last month
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ192Updated 3 months ago
- Implements VAR+CLIP for text-to-image (T2I) generationβ146Updated 11 months ago
- β122Updated 4 months ago