hustvl / DiffusionVLLinks
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
☆68Updated this week
Alternatives and similar repositories for DiffusionVL
Users that are interested in DiffusionVL are comparing it to the libraries listed below
Sorting:
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆150Updated 6 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆334Updated this week
- ☆166Updated 5 months ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆101Updated 6 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆151Updated 3 months ago
- ☆25Updated this week
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆299Updated 2 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆192Updated 3 months ago
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆184Updated 9 months ago
- Official respository for ReasonGen-R1☆73Updated 5 months ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆182Updated 7 months ago
- ☆138Updated 2 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆91Updated 3 weeks ago
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆133Updated this week
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆195Updated last week
- Cambrian-S: Towards Spatial Supersensing in Video☆429Updated this week
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆127Updated this week
- This is the offical repository of InfiniteVL☆54Updated this week
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆38Updated last week
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆194Updated this week
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆100Updated 2 weeks ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆135Updated last month
- ☆121Updated 4 months ago
- Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer☆130Updated 2 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆95Updated last month
- ☆91Updated last week
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆80Updated 3 months ago
- Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"☆421Updated 6 months ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆201Updated 5 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆235Updated 4 months ago