hao-ai-lab / JacobiForcingLinks
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
β120Updated this week
Alternatives and similar repositories for JacobiForcing
Users that are interested in JacobiForcing are comparing it to the libraries listed below
Sorting:
- The official repo of the paper "MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly"β169Updated last month
- π₯ JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimizationβ102Updated this week
- Multi-Reward as Condition for Instruction-Based Image Editingβ57Updated 9 months ago
- Official repository for the paper "TIIF-Bench: How Does Your T2I Model Follow Your Instructions?".β157Updated last month
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Modelsβ142Updated 7 months ago
- Repository for awesome spatial/visual reasoning MLLMs. (focus more on embodied applications)β72Updated 5 months ago
- [NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memoryβ91Updated last month
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimizationβ443Updated this week
- [NeurIPS 2025] Native-resolution diffusion Transformerβ294Updated 2 months ago
- ππ Efficient implementations of Native Sparse Attentionβ1,037Updated 2 months ago
- Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videosβ303Updated 2 months ago
- Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)β675Updated 2 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ181Updated 3 weeks ago
- οΌAAAI 2025οΌMUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaborationβ42Updated 6 months ago
- β297Updated 2 months ago
- β65Updated 7 months ago
- [NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agentβ725Updated last week
- TempFlow-GRPO (Temporal Flow GRPO), a principled GRPO framework that captures and exploits the temporal structure inherent in flow-based β¦β830Updated 3 weeks ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuningβ228Updated 8 months ago
- The code repository of UniRLβ47Updated 6 months ago
- βοΈ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraintsβ78Updated 5 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understandingβ181Updated 2 weeks ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ108Updated last month
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Giveβ¦β193Updated 2 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representationsβ192Updated 3 months ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inferenceβ214Updated 2 months ago
- [CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesisβ128Updated 7 months ago
- This is a framework for evaluating reasoning in foundational Video Models.β45Updated this week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β187Updated last month
- [NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models acroβ¦β91Updated 2 weeks ago