ByteFlow-AI / TokenFlow
π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
β164Updated this week
Alternatives and similar repositories for TokenFlow:
Users that are interested in TokenFlow are comparing it to the libraries listed below
- XQ-GANπ: An Open-source Image Tokenization Framework for Autoregressive Generationβ149Updated last week
- Implements VAR+CLIP for text-to-image (T2I) generationβ94Updated 2 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generationβ111Updated 2 months ago
- This is a repo to track the latest autoregressive visual generation papers.β71Updated last week
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"β78Updated 2 months ago
- β129Updated last month
- π₯stable, simple, state-of-the-art VQVAE toolkit & cookbookβ55Updated 5 months ago
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β266Updated last week
- Official implementation of MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesisβ84Updated 5 months ago
- [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Modelsβ246Updated 2 months ago
- β206Updated 5 months ago
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ177Updated last month
- official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]β55Updated 2 weeks ago
- Official implementation of the Law of Vision Representation in MLLMsβ139Updated last month
- This is the official implementation for ControlVAR.β68Updated last week
- CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficientβ73Updated 2 weeks ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generationβ51Updated 3 months ago
- The paper collections for the autoregressive models in vision.β310Updated this week
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ103Updated 7 months ago
- [Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generationβ219Updated 3 weeks ago
- CAR: Controllable AutoRegressive Modeling for Visual Generationβ73Updated 2 weeks ago
- FQGAN: Factorized Visual Tokenization and Generationβ36Updated 2 weeks ago
- The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generationβ77Updated last month
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perceptionβ125Updated last week
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generationβ84Updated 3 months ago
- Official code for "ControlAR: Controllable Image Generation with Autoregressive Models"β159Updated this week
- STAR: Scale-wise Text-to-image generation via Auto-Regressive representationsβ131Updated 6 months ago
- π₯ Aurora Series: A more efficient multimodal large language model series for video.β57Updated last month
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!β121Updated 11 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Modelsβ54Updated 6 months ago