sdiehl / tiny-r1Links
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
☆22Updated last year
Alternatives and similar repositories for tiny-r1
Users that are interested in tiny-r1 are comparing it to the libraries listed below
Sorting:
- 🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper☆39Updated 7 months ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆41Updated last year
- A tiny 1000 line implementation of GraphRAG in Python☆92Updated 3 months ago
- Pre-training code for CrystalCoder 7B LLM☆57Updated last year
- ☆132Updated 8 months ago
- ☆63Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Updated last year
- Composable inference algorithms with LLMs and programmable logic☆69Updated last year
- LILO: Library Induction with Language Observations☆90Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- This is the official repository for all the code of TheoremLlama☆47Updated 6 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆62Updated 10 months ago
- LLM verified with Monte Carlo Tree Search☆284Updated 10 months ago
- ☆42Updated last year
- Data mapping framework for rust stuff☆44Updated this week
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆58Updated 6 months ago
- Library for training process reward models☆29Updated 8 months ago
- ☆48Updated last year
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated 2 years ago
- [ACL 2025] NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering☆22Updated 6 months ago
- First-order logic theorem prover supporting unification with approximate vector similarity☆13Updated 2 years ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆114Updated last week
- Interview-based evaluation of LLMs☆23Updated last year
- [ICLR 2026] Efficient Agent Training for Computer Use☆135Updated 5 months ago
- ☆39Updated last year
- A Python reimplementation of "Planning with Large Language Models for Code Generation" (https://arxiv.org/abs/2303.05510)☆18Updated 2 years ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆132Updated last year
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆22Updated last year