fscdc / RewardMapLinks
[ICLR 2026] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
☆37Updated last week
Alternatives and similar repositories for RewardMap
Users that are interested in RewardMap are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆64Updated 4 months ago
- [NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models☆70Updated 3 months ago
- TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models☆64Updated 2 months ago
- ☆31Updated 6 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- ☆59Updated 5 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆114Updated 6 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆78Updated 2 months ago
- ☆63Updated last month
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆173Updated 3 weeks ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆237Updated last week
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆77Updated 2 weeks ago
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆81Updated 6 months ago
- Visual Planning: Let's Think Only with Images☆294Updated 8 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.☆71Updated 3 months ago
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models☆77Updated this week
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆79Updated 6 months ago
- Official repo for UAE☆161Updated last month
- [ICLR 2026] SparseD: Sparse Attention for Diffusion Language Models☆57Updated 4 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆64Updated last week
- MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence☆54Updated last month
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆46Updated 5 months ago
- A Collection of Papers on Diffusion Language Models☆155Updated 4 months ago
- ☆44Updated 2 months ago
- Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders☆188Updated last week
- Visual Spatial Tuning☆171Updated this week
- ☆142Updated 2 weeks ago