FoundationVision / UniTokLinks
A Unified Tokenizer for Visual Generation and Understanding
β354Updated last month
Alternatives and similar repositories for UniTok
Users that are interested in UniTok are comparing it to the libraries listed below
Sorting:
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"β385Updated 3 weeks ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β351Updated this week
- VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learningβ258Updated 3 months ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ545Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuningβ193Updated 2 months ago
- Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoningβ190Updated last month
- [ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformersβ301Updated 2 months ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β263Updated 2 months ago
- β457Updated last week
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-projectβ166Updated 3 months ago
- Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"β230Updated 2 months ago
- β111Updated 3 weeks ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"β166Updated 2 weeks ago
- β174Updated 6 months ago
- High-performance Image Tokenizers for VAR and ARβ275Updated 2 months ago
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generatβ¦β221Updated 2 months ago
- Official implementation of UnifiedReward & UnifiedReward-Thinkβ457Updated this week
- [CVPR2025] PyTorch-based reimplementation of CrossFlow, as proposed in 'Flowing from Words to Pixels: A Noise-Free Framework for Cross-Moβ¦β283Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generationβ126Updated last month
- This is a repo to track the latest autoregressive visual generation papers.β369Updated 2 weeks ago
- β128Updated 2 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generationβ141Updated last month
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generationβ278Updated 3 months ago
- [ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Modelsβ278Updated 2 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ362Updated 2 months ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ313Updated last month
- [CVPR 2025 (Oral)] Open implementation of "RandAR"β177Updated 3 months ago
- Implements VAR+CLIP for text-to-image (T2I) generationβ141Updated 5 months ago
- β211Updated last month
- β154Updated 5 months ago