ByteFlow-AI / TokenFlow
π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
β234Updated last month
Alternatives and similar repositories for TokenFlow:
Users that are interested in TokenFlow are comparing it to the libraries listed below
- XQ-GANπ: An Open-source Image Tokenization Framework for Autoregressive Generationβ182Updated last week
- Implements VAR+CLIP for text-to-image (T2I) generationβ116Updated this week
- This is a repo to track the latest autoregressive visual generation papers.β119Updated this week
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ324Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generationβ115Updated 2 weeks ago
- Liquid: Language Models are Scalable Multi-modal Generatorsβ60Updated last month
- β133Updated 2 weeks ago
- The paper collections for the autoregressive models in vision.β376Updated this week
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ227Updated this week
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β342Updated last week
- This is the official implementation for ControlVAR.β91Updated last month
- [Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluationβ229Updated this week
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generationβ61Updated last week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ212Updated last week
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β67Updated 2 months ago
- [ICLR2025]β131Updated this week
- [CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Modelsβ152Updated 3 months ago
- β131Updated last month
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representationsβ134Updated 7 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"β81Updated 3 months ago
- [ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Modelsβ183Updated last week
- [NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Modelsβ259Updated last month
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Modelsβ61Updated 8 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captionsβ191Updated 6 months ago
- CAR: Controllable AutoRegressive Modeling for Visual Generationβ96Updated 2 months ago
- SpeeD: A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Trainingβ165Updated this week
- RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with tβ¦β115Updated 7 months ago
- π Collection of awesome generation acceleration resources.β112Updated this week
- [ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxiβ¦β226Updated 8 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Betterβ247Updated last week