hustvl / DiffusionVLLinks
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
☆128Updated last month
Alternatives and similar repositories for DiffusionVL
Users that are interested in DiffusionVL are comparing it to the libraries listed below
Sorting:
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆303Updated last month
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆169Updated last month
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆275Updated 2 weeks ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆77Updated this week
- ☆35Updated last month
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆157Updated 8 months ago
- Official repo for UAE☆161Updated last month
- [ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…☆354Updated last week
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated last week
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆96Updated 2 weeks ago
- ☆63Updated 6 months ago
- Step3-VL-10B: A compact yet frontier multimodal model achieving SOTA performance at the 10B scale, matching open-source models 10-20x its…☆390Updated 2 weeks ago
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆129Updated last month
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆109Updated last month
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆77Updated 2 weeks ago
- VideoCoF: Unified Video Editing with Temporal Reasoner☆134Updated last month
- A Large-scale Video Action Dataset☆388Updated 3 weeks ago
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆184Updated 10 months ago
- ☆317Updated 2 weeks ago
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆437Updated last month
- ☆88Updated last month
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆216Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆98Updated 3 weeks ago
- GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning☆103Updated last week
- Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).☆420Updated 5 months ago
- ThinkGen: Generalized Thinking for Visual Generation☆48Updated last month
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago