autonomousvision / mdpoLinks
MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models
☆32Updated 3 weeks ago
Alternatives and similar repositories for mdpo
Users that are interested in mdpo are comparing it to the libraries listed below
Sorting:
- A framework that allows you to apply Sparse AutoEncoder on any models☆41Updated 3 months ago
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆154Updated 3 weeks ago
- ☆74Updated 3 months ago
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆86Updated last year
- ☆254Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆39Updated 8 months ago
- Official implementation of "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"☆80Updated last year
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆89Updated 7 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆84Updated 2 months ago
- ☆61Updated 5 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆74Updated 3 months ago
- Official PyTorch implementation of the paper "Equivariant Image Modeling"(https://arxiv.org/abs/2503.18948)☆34Updated 2 months ago
- Multimodal RewardBench☆53Updated 7 months ago
- ☆49Updated 2 weeks ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆49Updated 2 months ago
- [ICLR 2025] Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆46Updated 5 months ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆73Updated 2 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆23Updated last year
- ☆39Updated 4 months ago
- ☆45Updated 9 months ago
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆61Updated 5 months ago
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆75Updated 3 months ago
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆135Updated 2 months ago
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆51Updated last year
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆70Updated last month
- ☆50Updated 10 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆77Updated 10 months ago
- ☆35Updated 6 months ago
- ☆25Updated 2 months ago
- ☆40Updated last year