ALucek / GRPO-TrainingLinks
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆32Updated 2 weeks ago
Alternatives and similar repositories for GRPO-Training
Users that are interested in GRPO-Training are comparing it to the libraries listed below
Sorting:
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆14Updated 2 months ago
- LLM reads a paper and produce a working prototype☆57Updated last month
- ☆59Updated last week
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- Build a Recommendation System Agent using LATS Agent Approach☆30Updated 3 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆29Updated 3 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆113Updated 3 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆68Updated 5 months ago
- ☆29Updated last year
- ☆1Updated 10 months ago
- ☆14Updated last year
- Large Language Model (LLM) powered evaluator for Retrieval Augmented Generation (RAG) pipelines.☆27Updated last year
- Simple examples using Argilla tools to build AI☆53Updated 6 months ago
- Train transformer language models with reinforcement learning.☆19Updated 3 months ago
- Diagnose the performance of your RAG🩺☆36Updated 2 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆24Updated 2 months ago
- Agentic RAG to help you build a startup🚀☆43Updated 2 months ago
- A reasoning assistant for your STEM education☆19Updated 2 months ago
- ☆19Updated this week
- ☆46Updated 2 months ago
- ☆49Updated 6 months ago
- Fine tune Gemma 3 on an object detection task☆43Updated this week
- rl from zero pretrain, can it be done? we'll see.☆24Updated this week
- A virtual agent for your virtual books📚☆21Updated 2 weeks ago
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆75Updated 2 months ago
- ☆39Updated last month
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- ☆50Updated this week
- This is an open-source version of OpenAI's O1 Model Series by Siraj Raval & O1-Preview☆97Updated 7 months ago
- ☆54Updated 4 months ago