yfzhang114 / r1_rewardView external linksLinks
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆281May 9, 2025Updated 9 months ago
Alternatives and similar repositories for r1_reward
Users that are interested in r1_reward are comparing it to the libraries listed below
Sorting:
- The Next Step Forward in Multimodal LLM Alignment☆197May 1, 2025Updated 9 months ago
- ✨✨ [ICLR 2026] Think Beyond Images☆578Sep 23, 2025Updated 4 months ago
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆159Jun 26, 2025Updated 7 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆105Sep 18, 2025Updated 4 months ago
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 9 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex☆706Feb 10, 2026Updated last week
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆840May 14, 2025Updated 9 months ago
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated 10 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated last month
- ☆47Apr 9, 2025Updated 10 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆416Oct 4, 2025Updated 4 months ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆305May 14, 2025Updated 9 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆151Oct 21, 2025Updated 3 months ago
- Multimodal RewardBench☆61Feb 21, 2025Updated 11 months ago
- Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.☆3,149Dec 15, 2025Updated 2 months ago
- ☆63Feb 4, 2026Updated last week
- [NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆430Sep 18, 2025Updated 4 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆623Mar 18, 2025Updated 10 months ago
- [NeurIPS 2025] TTRL: Test-Time Reinforcement Learning☆989Sep 26, 2025Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆218Nov 27, 2025Updated 2 months ago
- repo for paper https://arxiv.org/abs/2504.13837☆327Dec 17, 2025Updated 2 months ago
- ☆813Jun 9, 2025Updated 8 months ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆768Sep 7, 2025Updated 5 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,319Oct 29, 2025Updated 3 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆33Jul 16, 2025Updated 7 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆75Sep 19, 2025Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆91Aug 8, 2025Updated 6 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,841Oct 21, 2025Updated 3 months ago
- ☆263May 14, 2025Updated 9 months ago
- Official Implementation of Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution☆66Dec 8, 2025Updated 2 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆80Sep 19, 2025Updated 4 months ago
- Align Anything: Training All-modality Model with Feedback☆4,632Nov 27, 2025Updated 2 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- ☆13Jan 22, 2025Updated last year
- ☆13Jul 10, 2024Updated last year
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,161Jul 15, 2025Updated 7 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆410Nov 21, 2025Updated 2 months ago