ChaofanTao / Autoregressive-Models-in-Vision-Survey
The paper collections for the autoregressive models in vision.
β419Updated this week
Alternatives and similar repositories for Autoregressive-Models-in-Vision-Survey:
Users that are interested in Autoregressive-Models-in-Vision-Survey are comparing it to the libraries listed below
- [CVPR 2025] π₯ Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".β273Updated this week
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β387Updated last month
- This is a repo to track the latest autoregressive visual generation papers.β150Updated this week
- SEED-Voken: A Series of Powerful Visual Tokenizersβ838Updated last week
- [ICLR25] High-performance Image Tokenizers for VAR and ARβ206Updated 2 weeks ago
- This repo contains the code for 1D tokenizer and generatorβ694Updated last week
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β434Updated last week
- Implements VAR+CLIP for text-to-image (T2I) generationβ123Updated last month
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systemsβ249Updated this week
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Surveyβ370Updated last month
- π Collection of awesome generation acceleration resources.β155Updated last week
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generationβ232Updated last month
- Diffusion Model-Based Image Editing: A Survey (arXiv)β565Updated this week
- [ICLR 2025] Autoregressive Video Generation without Vector Quantizationβ394Updated this week
- A paper list of some recent works about Token Compress for Vit and VLMβ338Updated 3 weeks ago
- Official repository for VisionZip (CVPR 2025)β240Updated this week
- A reading list of video generationβ503Updated last week
- A collection of awesome video generation studies.β465Updated last month
- Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLβ¦β845Updated last month
- You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.β307Updated last month
- HART: Efficient Visual Generation with Hybrid Autoregressive Transformerβ420Updated 4 months ago
- Scaling Diffusion Transformers with Mixture of Expertsβ275Updated 5 months ago
- PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838β1,307Updated 5 months ago
- [ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxiβ¦β227Updated 9 months ago
- [ICLR 2025] Diffusion Feedback Helps CLIP See Betterβ262Updated last month
- [CVPR 2025] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Modelsβ277Updated this week
- A collection of awesome text-to-image generation studies.β531Updated 2 weeks ago
- [ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generationβ247Updated this week
- A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGCβ482Updated this week