FoundationVision / UniTokLinks
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
β408Updated last week
Alternatives and similar repositories for UniTok
Users that are interested in UniTok are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β384Updated last month
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"β409Updated 3 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β289Updated last week
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learningβ264Updated 5 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuningβ212Updated 5 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ575Updated 3 weeks ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ219Updated 4 months ago
- β156Updated 3 months ago
- High-performance Image Tokenizers for VAR and ARβ289Updated 5 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ389Updated 5 months ago
- [ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformersβ349Updated 2 months ago
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generatβ¦β235Updated 5 months ago
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"β261Updated 5 months ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"β187Updated 3 months ago
- Implements VAR+CLIP for text-to-image (T2I) generationβ147Updated 8 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ151Updated last week
- This is a repo to track the latest autoregressive visual generation papers.β399Updated 3 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Thinkβ555Updated last week
- [CVPR 2025 (Oral)] Open implementation of "RandAR"β196Updated 2 months ago
- β119Updated last month
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-projectβ175Updated 6 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ169Updated 4 months ago
- β181Updated 9 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifieβ¦β263Updated this week
- [CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Moβ¦β301Updated 3 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ174Updated 2 weeks ago
- Official Implementation of Paper Transfer between Modalities with MetaQueriesβ240Updated 2 months ago
- Structured Video Comprehension of Real-World Shortsβ203Updated 2 weeks ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ343Updated 4 months ago
- [ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Modelsβ294Updated 5 months ago