FoundationVision / UniTok
A Unified Tokenizer for Visual Generation and Understanding
β210Updated 3 weeks ago
Alternatives and similar repositories for UniTok:
Users that are interested in UniTok are comparing it to the libraries listed below
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ429Updated this week
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β292Updated 3 weeks ago
- [CVPR2025] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project/β127Updated last week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"β167Updated this week
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video β¦β101Updated this week
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ256Updated 3 weeks ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ249Updated 2 months ago
- [NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Modelsβ268Updated 3 months ago
- [ICLR2025]β140Updated 2 months ago
- β139Updated 2 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAIβ313Updated 3 weeks ago
- VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAEβ301Updated 2 months ago
- [ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Modelsβ210Updated 2 months ago
- Official implementation of Unified Reward Model for Multimodal Understanding and Generation.β214Updated last week
- High-performance Image Tokenizers for VAR and ARβ226Updated last week
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generatβ¦β143Updated 3 weeks ago
- Implements VAR+CLIP for text-to-image (T2I) generationβ129Updated 2 months ago
- β147Updated 3 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformersβ142Updated 4 months ago
- β191Updated last month
- Empowering Unified MLLM with Multi-granular Visual Generationβ119Updated 2 months ago
- VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generationβ190Updated last month
- β167Updated last month
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β82Updated last month
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representationsβ137Updated last month
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"β86Updated 5 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Betterβ269Updated 2 months ago
- This is a repo to track the latest autoregressive visual generation papers.β178Updated this week
- Multimodal Models in Real Worldβ452Updated last month
- Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"β420Updated 6 months ago