fscdc / RewardMapLinks
[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
☆36Updated 2 months ago
Alternatives and similar repositories for RewardMap
Users that are interested in RewardMap are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆63Updated 3 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆76Updated last month
- ☆57Updated 4 months ago
- ☆38Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆112Updated 2 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆87Updated 5 months ago
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆51Updated this week
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆202Updated 2 months ago
- ☆30Updated 5 months ago
- Visual Planning: Let's Think Only with Images☆290Updated 7 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 6 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆33Updated 6 months ago
- TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models☆64Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆198Updated 8 months ago
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆80Updated 5 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆35Updated last year
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆56Updated 6 months ago
- This is the offical repository of InfiniteVL☆68Updated 3 weeks ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆234Updated this week
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆105Updated last month
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆78Updated 6 months ago
- ☆49Updated this week
- ☆162Updated last month
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆68Updated 3 months ago
- ☆304Updated 3 weeks ago
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆136Updated 2 weeks ago
- ☆65Updated 2 months ago
- Official repository of Vision Test-Time Training☆48Updated last month
- Cambrian-S: Towards Spatial Supersensing in Video☆468Updated 2 weeks ago
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆36Updated 6 months ago