sdiehl / tiny-r1Links
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
☆21Updated 10 months ago
Alternatives and similar repositories for tiny-r1
Users that are interested in tiny-r1 are comparing it to the libraries listed below
Sorting:
- A tiny 1000 line implementation of GraphRAG in Python☆90Updated last month
- 🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper☆39Updated 6 months ago
- LILO: Library Induction with Language Observations☆89Updated last year
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆43Updated 2 years ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆114Updated 2 months ago
- ☆128Updated 6 months ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Updated last year
- LLM verified with Monte Carlo Tree Search☆284Updated 8 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated last year
- Library for training process reward models☆29Updated 6 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆63Updated 8 months ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆40Updated last year
- Simple high-throughput inference library☆153Updated 7 months ago
- This is the official repository for all the code of TheoremLlama☆47Updated 4 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆133Updated last year
- Composable inference algorithms with LLMs and programmable logic☆69Updated last year
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆90Updated last year
- ☆35Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆100Updated 2 years ago
- ☆68Updated last year
- Source code for paper: INTERVENOR : Prompt the Coding Ability of Large Language Models with the Interactive Chain of Repairing☆28Updated last year
- First-order logic theorem prover supporting unification with approximate vector similarity☆13Updated 2 years ago
- Code repo for MathAgent☆18Updated 2 years ago
- ☆39Updated last year
- ☆80Updated 9 months ago
- Official Implementation of "Simulating Environments with Reasoning Models for Agent Training"☆48Updated last week
- ☆63Updated last year
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆48Updated 11 months ago