ChaofanTao / Autoregressive-Models-in-Vision-Survey
[TMLR 2025π₯] A survey for the autoregressive models in vision.
β462Updated last week
Alternatives and similar repositories for Autoregressive-Models-in-Vision-Survey:
Users that are interested in Autoregressive-Models-in-Vision-Survey are comparing it to the libraries listed below
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β443Updated 2 weeks ago
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β296Updated 3 weeks ago
- This is a repo to track the latest autoregressive visual generation papers.β193Updated this week
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ445Updated this week
- [CVPR 2025] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Modelsβ595Updated this week
- This repo contains the code for 1D tokenizer and generatorβ794Updated last week
- High-performance Image Tokenizers for VAR and ARβ228Updated last week
- SEED-Voken: A Series of Powerful Visual Tokenizersβ856Updated last month
- π Collection of awesome generation acceleration resources.β182Updated this week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ255Updated 2 months ago
- Official implementation of Unified Reward Model for Multimodal Understanding and Generation.β225Updated this week
- β124Updated this week
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β447Updated this week
- Implements VAR+CLIP for text-to-image (T2I) generationβ131Updated 2 months ago
- [ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Thinkβ894Updated 2 weeks ago
- Official repository for VisionZip (CVPR 2025)β262Updated last month
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyβ403Updated 2 months ago
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systemsβ263Updated last month
- A collection of awesome video generation studies.β489Updated 3 weeks ago
- You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.β336Updated 2 months ago
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838β1,428Updated 6 months ago
- Diffusion Model-Based Image Editing: A Survey (TPAMI 2025)β589Updated last week
- Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAIβ1,006Updated 2 weeks ago
- Infinity β : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesisβ1,072Updated last month
- A paper list of some recent works about Token Compress for Vit and VLMβ399Updated this week
- Scaling Diffusion Transformers with Mixture of Expertsβ304Updated 6 months ago
- Implementation of MagViT2 Tokenizer in Pytorchβ597Updated 2 months ago
- [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.β1,314Updated this week
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Surveyβ314Updated this week
- This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generatβ¦β152Updated last month