stanford-cs336 / assignment5-alignmentLinks

☆82

Alternatives and similar repositories for assignment5-alignment

Users that are interested in assignment5-alignment are comparing it to the libraries listed below

Sorting:

tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆281Updated 2 months ago
mingyin0312 / RLFromScratch
☆463Updated 3 months ago
open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆303Updated 10 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆164Updated 7 months ago
stanford-cs336 / assignment2-systems
Student version of Assignment 2 for Stanford CS336 - Language Modeling From Scratch
☆131Updated 4 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆126Updated 6 months ago
SiliangZeng / Multi-Turn-RL-Agent
☆100Updated 5 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆100Updated 8 months ago
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆145Updated 10 months ago
facebookresearch / PhysicsLM4
Physics of Language Models, Part 4
☆262Updated 4 months ago
hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆118Updated last year
neubig / minllama-assignment
☆99Updated last year
LeonGuertler / UnstableBaselines
☆107Updated this week
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆309Updated 2 months ago
yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
☆139Updated 9 months ago
eth-sri / matharena
Evaluation of LLMs on latest math competitions
☆197Updated last month
ypwang61 / One-Shot-RLVR
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆383Updated 2 weeks ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆322Updated this week
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆249Updated 3 months ago
axon-rl / gem
A Gym for Agentic LLMs
☆371Updated 3 weeks ago
huggingface / picotron_tutorial
☆224Updated last week
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
Zhiyuan-Zeng / RLVE
[Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
☆151Updated 3 weeks ago
stanford-cs336 / spring2024-lectures
☆403Updated 11 months ago
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆380Updated 4 months ago
epfml / llm-optimizer-benchmark
Benchmarking Optimizers for LLM Pretraining
☆42Updated 3 weeks ago
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 4 months ago
AIMO-CMU-MATH / CMU_MATH-AIMO
☆79Updated last year
ulab-uiuc / tiny-scientist
[EMNLP 2025 Demo] TinyScientist: A Lightweight Framework for Building Research Agents
☆119Updated last month