Qihoo360 / Light-R1
☆679Updated 3 weeks ago
Alternatives and similar repositories for Light-R1:
Users that are interested in Light-R1 are comparing it to the libraries listed below
- ☆739Updated 2 weeks ago
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆441Updated 2 weeks ago
- A series of technical report on Slow Thinking with LLM☆659Updated 3 weeks ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆495Updated 2 weeks ago
- Large Reasoning Models☆804Updated 5 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.☆312Updated 3 weeks ago
- ☆526Updated 4 months ago
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆2,258Updated this week
- Distributed RL System for LLM Reasoning☆1,205Updated last week
- Official Repo for Open-Reasoner-Zero☆1,904Updated last month
- A very simple GRPO implement for reproducing r1-like LLM thinking.☆1,014Updated last month
- Collect every awesome work about r1!☆356Updated last week
- minimal-cost for training 0.5B R1-Zero☆714Updated last week
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning☆808Updated last week
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆310Updated 2 weeks ago
- Awesome RL Reasoning Recipes ("Triple R")☆520Updated this week
- AN O1 REPLICATION FOR CODING☆333Updated 4 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,198Updated 3 weeks ago
- ☆671Updated last week
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- ☆168Updated last month
- Scalable RL solution for advanced reasoning of language models☆1,529Updated last month
- A fork to add multimodal model training to open-r1☆1,245Updated 3 months ago
- Reproduce R1 Zero on Logic Puzzle☆2,327Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspective☆908Updated 3 weeks ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆513Updated 3 weeks ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆590Updated this week
- ☆924Updated 3 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆188Updated 2 months ago
- Unleashing the Power of Reinforcement Learning for Math and Code Reasoners☆540Updated 2 weeks ago