lcqysl / DiffThinkerLinks
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
☆98Updated this week
Alternatives and similar repositories for DiffThinker
Users that are interested in DiffThinker are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of TokenSet.☆127Updated 9 months ago
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆117Updated 2 weeks ago
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆128Updated this week
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆114Updated 4 months ago
- This is the offical repository of InfiniteVL☆68Updated 3 weeks ago
- Test-time Scaling for VAR models☆28Updated 3 months ago
- Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.☆96Updated last week
- ☆63Updated 5 months ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆133Updated 9 months ago
- Official implementation of "Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs".☆96Updated 2 months ago
- [arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning☆50Updated 3 weeks ago
- VCode: SVG as Symbolic Visual Representation☆116Updated 2 weeks ago
- Official repo for UAE☆125Updated last week
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 8 months ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆214Updated last week
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆44Updated last month
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆202Updated 2 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆190Updated 2 weeks ago
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆100Updated last week
- The official repo of VideoAgentTrek☆39Updated 2 months ago
- ☆35Updated 3 weeks ago
- Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"☆72Updated last month
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆192Updated 3 months ago
- [NeurIPS 2025 Oral] Exploring Diffusion Transformer Designs via Grafting☆69Updated 6 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆38Updated 6 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆186Updated 3 weeks ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆112Updated 2 months ago
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆162Updated 5 months ago
- ☆68Updated 3 months ago
- ☆38Updated last month