bruno686 / Awesome-RL-based-LLM-Reasoning
Awesome RL-based LLM Reasoning
☆352Updated this week
Alternatives and similar repositories for Awesome-RL-based-LLM-Reasoning:
Users that are interested in Awesome-RL-based-LLM-Reasoning are comparing it to the libraries listed below
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆484Updated last week
- ☆186Updated this week
- Paper list for Efficient Reasoning.☆331Updated this week
- Paper List of Inference/Test Time Scaling/Computing☆131Updated this week
- SOTA RL fine-tuning solution for advanced math reasoning of LLM☆92Updated this week
- A Survey on Efficient Reasoning for LLMs☆116Updated this week
- A series of technical report on Slow Thinking with LLM☆595Updated last week
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆153Updated this week
- MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning☆425Updated last week
- ☆507Updated 2 months ago
- Latest Advances on System-2 Reasoning☆836Updated last week
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆125Updated 3 months ago
- Explore the Multimodal “Aha Moment” on 2B Model☆524Updated last week
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆357Updated 2 months ago
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆643Updated this week
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆183Updated this week
- ☆117Updated 2 weeks ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆597Updated 2 months ago
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆280Updated 3 weeks ago
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆404Updated this week
- ☆71Updated last week
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆43Updated 2 weeks ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆94Updated 2 weeks ago
- The related works and background techniques about Openai o1☆217Updated 2 months ago
- Collect every awesome work about r1!☆306Updated last week
- ☆262Updated last week
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆301Updated last month
- Survey on LLM Agents (Published on CoLing 2025)☆174Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆216Updated this week
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆299Updated 7 months ago