LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆73Updated this week
Alternatives and similar repositories for limit-of-RLVR:
Users that are interested in limit-of-RLVR are comparing it to the libraries listed below
- Code for "Reasoning to Learn from Latent Thoughts"☆91Updated 3 weeks ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆33Updated 7 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆61Updated this week
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆27Updated last week
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆92Updated this week
- ☆73Updated 3 months ago
- The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark"☆51Updated this week
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated last month
- ☆13Updated 4 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆132Updated 5 months ago
- ☆125Updated 3 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆76Updated 3 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆65Updated 2 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 10 months ago
- TTRL: Test-Time Reinforcement Learning☆166Updated this week
- ☆40Updated 3 months ago
- Multimodal RewardBench☆38Updated 2 months ago
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis☆22Updated 4 months ago
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding☆75Updated 3 weeks ago
- ☆81Updated this week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆117Updated last week
- ☆84Updated 2 weeks ago
- ☆39Updated last month
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆36Updated last week
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆44Updated 4 months ago
- ☆23Updated last week
- ☆74Updated this week
- ☆25Updated last week
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆35Updated 3 weeks ago