sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
β568Updated this week
Alternatives and similar repositories for understand-r1-zero:
Users that are interested in understand-r1-zero are comparing it to the libraries listed below
- Large Reasoning Modelsβ800Updated 3 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.β283Updated this week
- β559Updated 2 weeks ago
- LIMO: Less is More for Reasoningβ875Updated last month
- Official Repo for Open-Reasoner-Zeroβ1,687Updated 3 weeks ago
- Recipes to scale inference-time compute of open modelsβ1,048Updated last month
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ915Updated this week
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)β597Updated 2 months ago
- A series of technical report on Slow Thinking with LLMβ595Updated last week
- Muon is Scalable for LLM Trainingβ974Updated last month
- β485Updated last week
- Explore the Multimodal βAha Momentβ on 2B Modelβ524Updated last week
- β262Updated last week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ388Updated 4 months ago
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"β474Updated last week
- β913Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β310Updated 3 months ago
- OLMoE: Open Mixture-of-Experts Language Modelsβ693Updated 2 weeks ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.β216Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learningβ376Updated last week
- β518Updated last week
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templatesβ353Updated last week
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learningβ395Updated this week
- Training Large Language Model to Reason in a Continuous Latent Spaceβ998Updated 2 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.β1,265Updated this week
- A bibliography and survey of the papers surrounding o1β1,183Updated 4 months ago
- Scalable RL solution for advanced reasoning of language modelsβ1,445Updated last week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β706Updated 6 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β590Updated last week
- Pretraining code for a large-scale depth-recurrent language modelβ709Updated 2 weeks ago