s-smits / grpo-optunaLinks

Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna

☆55

Alternatives and similar repositories for grpo-optuna

Users that are interested in grpo-optuna are comparing it to the libraries listed below

Sorting:

brendanhogan / picoDeepResearch
☆64Updated 2 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆103Updated 4 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated 2 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 9 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆103Updated 4 months ago
catid / lllm
Latent Large Language Models
☆18Updated 11 months ago
arcee-ai / DAM
☆53Updated 8 months ago
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆49Updated 5 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆45Updated 2 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆78Updated last week
xjdr-alt / muzero_sketch
☆38Updated last year
waefrebeorn / KAN-WuBu-Memory
An AI character interaction system with emotional modeling and advanced memory management
☆16Updated 9 months ago
allenai / infinigram-api
☆70Updated 2 weeks ago
axolotl-ai-cloud / axolotl-cookbook
☆34Updated 4 months ago
rosmineb / unit_test_rl
Project code for training LLMs to write better unit tests + code
☆21Updated 2 months ago
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 7 months ago
nahidalam / maya
Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya
☆117Updated last week
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆68Updated 3 months ago
goncalorafaria / qalign
QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.
☆23Updated 3 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 8 months ago
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 3 months ago
yueqis / API-Based-Agent
☆54Updated last month
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆72Updated 4 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 5 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆68Updated 3 months ago
QuixiAI / grokadamw
☆134Updated 11 months ago
reka-ai / rekaquant
☆58Updated 3 weeks ago