✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆283May 9, 2025Updated 10 months ago
Alternatives and similar repositories for r1_reward
Users that are interested in r1_reward are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The Next Step Forward in Multimodal LLM Alignment☆199May 1, 2025Updated 10 months ago
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Mar 28, 2025Updated last year
- ✨✨ [ICLR 2026] Think Beyond Images☆590Sep 23, 2025Updated 6 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 11 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 8 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…☆77Apr 28, 2025Updated 11 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆155Oct 21, 2025Updated 5 months ago
- ✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy☆306May 14, 2025Updated 10 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆107Sep 18, 2025Updated 6 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Dec 26, 2024Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆161Jun 26, 2025Updated 9 months ago
- Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.☆3,165Dec 15, 2025Updated 3 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆77Sep 19, 2025Updated 6 months ago
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex☆748Mar 19, 2026Updated last week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆846May 14, 2025Updated 10 months ago
- Multimodal RewardBench☆66Feb 21, 2025Updated last year
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆54Mar 24, 2026Updated last week
- ✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model☆677May 24, 2025Updated 10 months ago
- Align Anything: Training All-modality Model with Feedback☆4,636Nov 27, 2025Updated 4 months ago
- ✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…☆406Jan 14, 2026Updated 2 months ago
- [NeurIPS 2025] T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT☆432Sep 18, 2025Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆426Mar 20, 2026Updated last week
- ☆813Jun 9, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆81Dec 25, 2025Updated 3 months ago
- ☆66Feb 4, 2026Updated last month
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,321Oct 29, 2025Updated 5 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆623Mar 18, 2025Updated last year
- [NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,242Jan 16, 2026Updated 2 months ago
- Solve Visual Understanding with Reinforced VLMs☆5,898Mar 12, 2026Updated 2 weeks ago
- A fork to add multimodal model training to open-r1☆1,514Feb 8, 2025Updated last year
- [NeurIPS 2025] TTRL: Test-Time Reinforcement Learning☆1,034Mar 11, 2026Updated 2 weeks ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆772Sep 7, 2025Updated 6 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- repo for paper https://arxiv.org/abs/2504.13837☆333Dec 17, 2025Updated 3 months ago
- Witness the aha moment of VLM with less than $3.☆4,046May 19, 2025Updated 10 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆83Sep 19, 2025Updated 6 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆93Aug 8, 2025Updated 7 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,171Jul 15, 2025Updated 8 months ago
- ☆267May 14, 2025Updated 10 months ago
- Online Preference Alignment for Language Models via Count-based Exploration☆17Jan 14, 2025Updated last year