ModalMinds / MM-PRM
MM-PRM: An open implementation of Multimodal OmegaPRM and its corresponding training pipeline
☆13Updated last month
Alternatives and similar repositories for MM-PRM
Users that are interested in MM-PRM are comparing it to the libraries listed below
Sorting:
- ☆43Updated last month
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆25Updated last month
- ☆44Updated last week
- A Self-Training Framework for Vision-Language Reasoning☆77Updated 3 months ago
- Official implement of MIA-DPO☆57Updated 3 months ago
- Code release for VTW (AAAI 2025) Oral☆39Updated 3 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated last week
- ☆83Updated last month
- ☆75Updated 4 months ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆94Updated 2 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆56Updated 10 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆44Updated last month
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆104Updated 3 weeks ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆50Updated this week
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆59Updated last month
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆20Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆83Updated last month
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆40Updated 2 weeks ago
- ☆79Updated last month
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆90Updated last week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆47Updated 2 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding☆77Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆104Updated 2 weeks ago
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆39Updated 3 weeks ago
- ☆29Updated 3 weeks ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆104Updated last week
- ✈️ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆66Updated last month
- This repository will continuously update the latest papers, technical reports, benchmarks about multimodal reasoning!☆37Updated last month