hustvl / DiffusionVLLinks
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
☆127Updated last month
Alternatives and similar repositories for DiffusionVL
Users that are interested in DiffusionVL are comparing it to the libraries listed below
Sorting:
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆165Updated 3 weeks ago
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆290Updated 3 weeks ago
- This is the offical repository of InfiniteVL☆76Updated last month
- ☆35Updated last month
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆270Updated last week
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆156Updated 7 months ago
- Official repo for UAE☆155Updated last month
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆48Updated this week
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆96Updated this week
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆108Updated last month
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆428Updated last month
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆191Updated last month
- ☆63Updated 6 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆351Updated 3 weeks ago
- PyTorch implementation of NEPA☆303Updated last week
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆209Updated last month
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆123Updated last month
- A Large-scale Video Action Dataset☆341Updated 2 weeks ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆238Updated 5 months ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆269Updated 2 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆183Updated this week
- The official repository of "Astra : General Interactive World Model with Autoregressive Denoising"☆197Updated 2 weeks ago
- ☆58Updated 8 months ago
- ☆491Updated last month