lcqysl / DiffThinkerLinks
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
☆166Updated last month
Alternatives and similar repositories for DiffThinker
Users that are interested in DiffThinker are comparing it to the libraries listed below
Sorting:
- Official repo for UAE☆161Updated last month
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆128Updated last month
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- A Large-scale Video Action Dataset☆376Updated 3 weeks ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆77Updated this week
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆303Updated 3 weeks ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated last week
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆237Updated last week
- Test-time Scaling for VAR models☆31Updated 4 months ago
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- Official PyTorch implementation of TokenSet.☆127Updated 10 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆77Updated 2 weeks ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆207Updated last week
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆119Updated 5 months ago
- ☆35Updated last month
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago
- [ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…☆354Updated last week
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆198Updated 4 months ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆79Updated 2 months ago
- VCode: SVG as Symbolic Visual Representation☆122Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 9 months ago
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆32Updated 3 weeks ago
- ☆63Updated 6 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆194Updated last month
- Cambrian-S: Towards Spatial Supersensing in Video☆488Updated last month
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).☆275Updated 2 weeks ago
- Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders☆202Updated this week