hustvl / DiffusionVLLinks
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
☆121Updated 2 weeks ago
Alternatives and similar repositories for DiffusionVL
Users that are interested in DiffusionVL are comparing it to the libraries listed below
Sorting:
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆128Updated this week
- DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models☆98Updated last week
- ☆35Updated 3 weeks ago
- This is the offical repository of InfiniteVL☆68Updated 3 weeks ago
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆151Updated 7 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆127Updated 3 weeks ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆96Updated 2 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆102Updated 2 weeks ago
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆260Updated this week
- ☆63Updated 6 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆174Updated 3 weeks ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆343Updated this week
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆134Updated 4 months ago
- PyTorch implementation of NEPA☆262Updated 2 weeks ago
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆152Updated 3 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆114Updated 5 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆468Updated 2 weeks ago
- Towards Scalable Pre-training of Visual Tokenizers for Generation☆405Updated 3 weeks ago
- ☆81Updated 3 weeks ago
- ☆169Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆112Updated 2 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆57Updated this week
- ☆81Updated last month
- ☆57Updated 7 months ago
- ☆298Updated this week
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆184Updated 9 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆192Updated 3 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆260Updated 2 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆166Updated this week
- The official repository of "Astra : General Interactive World Model with Autoregressive Denoising"☆181Updated last week