ALucek / GRPO-TrainingLinks
An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning
☆37Updated 5 months ago
Alternatives and similar repositories for GRPO-Training
Users that are interested in GRPO-Training are comparing it to the libraries listed below
Sorting:
- LLM reads a paper and produce a working prototype☆57Updated 7 months ago
- Build a Recommendation System Agent using LATS Agent Approach☆33Updated 8 months ago
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆165Updated 2 months ago
- ☆25Updated last year
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆84Updated 7 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆58Updated 3 weeks ago
- ☆86Updated last year
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 11 months ago
- Train your own SOTA deductive reasoning model☆108Updated 8 months ago
- ☆46Updated 7 months ago
- ☆68Updated 5 months ago
- ☆98Updated 7 months ago
- Simple GRPO scripts and configurations.☆59Updated 9 months ago
- Luth is a state-of-the-art series of fine-tuned LLMs for French☆38Updated last month
- ☆182Updated 9 months ago
- Advanced Coding AI Assistant that uses a Gradio interface to stream coding related responses. ChatRAG supports local and API inference an…☆22Updated 6 months ago
- craft post-training data recipes☆60Updated this week
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 9 months ago
- ☆55Updated 2 months ago
- ☆158Updated 7 months ago
- Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework.☆92Updated last year
- purpose of this repo is to Implement LLMOPs as shared in Deeplearning AI course☆35Updated 2 weeks ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆31Updated 8 months ago
- ☆14Updated last year
- ☆43Updated last week
- ☆119Updated last year
- Inference, Fine Tuning and many more recipes with Gemma family of models☆275Updated 3 months ago
- This project is a **proof of concept** that aims to replicate the reasoning capabilities of OpenAI's newly released O1 model.☆90Updated 9 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated 10 months ago
- Agentic RAG to help you build a startup🚀☆55Updated 7 months ago