Qihoo360 / Light-R1
☆518Updated this week
Alternatives and similar repositories for Light-R1:
Users that are interested in Light-R1 are comparing it to the libraries listed below
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆376Updated this week
- A series of technical report on Slow Thinking with LLM☆595Updated this week
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆767Updated last week
- Large Reasoning Models☆799Updated 3 months ago
- minimal-cost for training 0.5B R1-Zero☆668Updated 2 weeks ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆177Updated last month
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆278Updated last week
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates☆353Updated this week
- ☆504Updated 2 months ago
- Official Repo for Open-Reasoner-Zero☆1,667Updated 3 weeks ago
- Scalable RL solution for advanced reasoning of language models☆1,419Updated last week
- ☆559Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆1,681Updated this week
- A very simple GRPO implement for reproducing r1-like LLM thinking.☆782Updated this week
- Collect every awesome work about r1!☆306Updated last week
- Explore the Multimodal “Aha Moment” on 2B Model☆524Updated last week
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning☆306Updated 3 weeks ago
- AN O1 REPLICATION FOR CODING☆329Updated 3 months ago
- ☆910Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆568Updated this week
- OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next…☆124Updated this week
- ☆113Updated 2 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆1,210Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆597Updated 2 months ago
- ☆186Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆212Updated this week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆158Updated last week
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆225Updated last month
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆238Updated 2 months ago
- ☆260Updated last week