lcqysl / DiffThinkerLinks
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
☆165Updated 3 weeks ago
Alternatives and similar repositories for DiffThinker
Users that are interested in DiffThinker are comparing it to the libraries listed below
Sorting:
- [ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models☆125Updated last month
- Official repo for UAE☆155Updated last month
- NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation☆290Updated 3 weeks ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 2 months ago
- [ICLR'26] Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs☆96Updated this week
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆205Updated this week
- PyTorch implementation of NEPA☆296Updated last month
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆189Updated last month
- [ICLR 2026] Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusio…☆98Updated this week
- ☆63Updated 6 months ago
- A Large-scale Video Action Dataset☆341Updated 2 weeks ago
- This is the offical repository of InfiniteVL☆76Updated last month
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆116Updated 5 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆195Updated 4 months ago
- NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intellige…☆598Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆62Updated 9 months ago
- Official PyTorch implementation of TokenSet.☆127Updated 10 months ago
- Test-time Scaling for VAR models☆30Updated 4 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆351Updated 3 weeks ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆236Updated 3 weeks ago
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆31Updated 3 weeks ago
- Visual Planning: Let's Think Only with Images☆294Updated 8 months ago
- ☆35Updated last month
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head (ICLR 2026)☆106Updated this week
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆232Updated last week
- ☆162Updated last year
- [CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project☆184Updated 10 months ago