sdiehl / tiny-r1
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
☆20Updated 2 months ago
Alternatives and similar repositories for tiny-r1:
Users that are interested in tiny-r1 are comparing it to the libraries listed below
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆25Updated 9 months ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 5 months ago
- [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …☆40Updated last year
- Enable moe for nanogpt.☆25Updated last year
- First token cutoff sampling inference example☆29Updated last year
- Because it's there.☆16Updated 6 months ago
- This is the official repository for all the code of TheoremLlama☆40Updated 6 months ago
- program synthesis with neuro-symbolic differentiable interpreters☆13Updated last year
- Class of data structures that can be unfolded.☆22Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 6 months ago
- Resources accompanying the "Zero-Shot Recommendation as Language Modeling" paper (ECIR2022)☆13Updated last year
- Simple repository for training small reasoning models☆12Updated 2 months ago
- Library for training process reward models☆23Updated last month
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 10 months ago
- Training hybrid models for dummies.☆20Updated 3 months ago
- NanoGPT (124M) quality in 2.67B tokens☆28Updated this week
- FinRAG: Financial Retrieval Augmented Generation☆20Updated 7 months ago
- ☆21Updated last year
- ☆19Updated 7 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆42Updated 3 months ago
- High-performance tokenized language data-loader for Python C++ extension☆13Updated 8 months ago
- Proof in Lean of Fermat Last Theorem for exponent 3☆38Updated 9 months ago
- A tiny 1000 line implementation of GraphRAG in Python☆66Updated last month
- Submission to the inverse scaling prize☆23Updated last year
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆96Updated last year
- LeanUniverse: A Library for Consistent and Scalable Lean4 Dataset Management☆61Updated 3 months ago
- Official repository for the paper "Goal-Conditioned Generators of Deep Policies"☆11Updated 2 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Rust bindings for CTranslate2☆14Updated last year
- ☆51Updated last month