axolotl-ai-cloud / grpo_codeLinks

A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.

☆40

Alternatives and similar repositories for grpo_code

Users that are interested in grpo_code are comparing it to the libraries listed below

Sorting:

OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 8 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆84Updated 8 months ago
PrimeIntellect-ai / genesys
☆136Updated 8 months ago
brendanhogan / picoDeepResearch
☆68Updated 6 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 9 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated last year
collinear-ai / spider
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆65Updated last week
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆225Updated this week
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆56Updated last year
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆80Updated 7 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆60Updated 6 months ago
arcee-ai / DAM
☆55Updated last year
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 10 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
StigLidu / DualDistill
[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆100Updated 3 months ago
OpenPipe / open_deep_research_training
Training setup for Langchain's Open Deep Research
☆72Updated 3 months ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆79Updated last year
kurakurai / Luth
Luth is a state-of-the-art series of fine-tuned LLMs for French
☆40Updated last month
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
geronimi73 / phi2-finetune
☆86Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆52Updated 9 months ago
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆27Updated last year
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆60Updated 7 months ago
axolotl-ai-cloud / axolotl-cookbook
☆36Updated 4 months ago
yueqis / API-Based-Agent
☆62Updated 5 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆110Updated 7 months ago
Intelligent-Internet / ii-thought
II-Thought-RL is our initial attempt at developing a large-scale, multi-domain Reinforcement Learning (RL) dataset
☆29Updated 7 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆118Updated 7 months ago
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 11 months ago