ALucek / GRPO-TrainingLinks
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆34Updated last month
Alternatives and similar repositories for GRPO-Training
Users that are interested in GRPO-Training are comparing it to the libraries listed below
Sorting:
- LLM reads a paper and produce a working prototype☆58Updated 3 months ago
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆15Updated 3 months ago
- Train your own SOTA deductive reasoning model☆96Updated 4 months ago
- ☆179Updated 4 months ago
- Agentic RAG to help you build a startup🚀☆45Updated 3 months ago
- ☆156Updated 2 months ago
- ☆86Updated 9 months ago
- Simple examples using Argilla tools to build AI☆53Updated 7 months ago
- This is an open-source version of OpenAI's O1 Model Series by Siraj Raval & O1-Preview☆97Updated 8 months ago
- ☆94Updated 3 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆71Updated 3 months ago
- Simple GRPO scripts and configurations.☆59Updated 5 months ago
- rl from zero pretrain, can it be done? we'll see.☆65Updated 3 weeks ago
- ☆50Updated 2 weeks ago
- ☆64Updated last month
- ☆54Updated 5 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆115Updated 5 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆32Updated 3 months ago
- ☆71Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆54Updated 5 months ago
- ☆42Updated 2 months ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆223Updated last week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆73Updated 8 months ago
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆79Updated 3 months ago
- A reasoning assistant for your STEM education☆19Updated 4 months ago
- RAG example using DSPy, Gradio, FastAPI☆83Updated last year
- A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.☆42Updated 5 months ago
- ☆22Updated 11 months ago
- ☆118Updated 10 months ago