fscdc / RewardMapLinks
[arxiv 2025] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
☆34Updated last month
Alternatives and similar repositories for RewardMap
Users that are interested in RewardMap are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆63Updated 2 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆59Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆84Updated 4 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆193Updated 2 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆76Updated last month
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆54Updated 5 months ago
- ☆55Updated 4 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆108Updated last month
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆32Updated 6 months ago
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆66Updated 2 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆113Updated 5 months ago
- TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models☆62Updated 3 weeks ago
- Cambrian-S: Towards Spatial Supersensing in Video☆429Updated this week
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆91Updated 3 weeks ago
- ☆29Updated 4 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆70Updated last month
- ☆59Updated 9 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated last year
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆142Updated this week
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆40Updated 10 months ago
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆36Updated this week
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆408Updated 2 weeks ago
- Visual Planning: Let's Think Only with Images☆285Updated 7 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 2 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆243Updated last month
- ☆108Updated last month
- ☆152Updated 3 weeks ago
- Visual Spatial Tuning☆154Updated 2 weeks ago
- A collection of vision foundation models unifying understanding and generation.☆59Updated 11 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆207Updated 4 months ago