McGill-NLP / nano-aha-momentLinks

Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"

☆512

Alternatives and similar repositories for nano-aha-moment

Users that are interested in nano-aha-moment are comparing it to the libraries listed below

Sorting:

sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,055Updated 2 weeks ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆245Updated 3 weeks ago
sail-sg / oat
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
☆425Updated last week
groundlight / r1_vlm
Build your own visual reasoning model
☆401Updated last week
PrimeIntellect-ai / prime-rl
Decentralized RL Training at Scale
☆403Updated this week
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆344Updated 7 months ago
huggingface / search-and-learn
Recipes to scale inference-time compute of open models
☆1,110Updated 2 months ago
huggingface / picotron_tutorial
☆208Updated 5 months ago
open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆274Updated 6 months ago
open-thought / reasoning-gym
procedural reasoning datasets
☆1,012Updated last week
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆274Updated 2 months ago
srush / awesome-o1
A bibliography and survey of the papers surrounding o1
☆1,209Updated 8 months ago
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,224Updated 6 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆270Updated this week
NovaSky-AI / SkyRL
SkyRL: A Modular Full-stack RL Library for LLMs
☆698Updated this week
microsoft / rStar
☆608Updated 3 weeks ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆164Updated 5 months ago
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆578Updated this week
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆808Updated 3 weeks ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆322Updated 8 months ago
marin-community / marin
☆353Updated this week
NVIDIA / NeMo-Skills
A project to improve skills of large language models
☆507Updated this week
facebookresearch / MLGym
MLGym A New Framework and Benchmark for Advancing AI Research Agents
☆538Updated 2 weeks ago
natolambert / rlhf-book
Textbook on reinforcement learning from human feedback
☆1,158Updated this week
stanford-cs336 / spring2024-lectures
☆339Updated 7 months ago
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆344Updated 3 weeks ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆225Updated last week
mlfoundations / evalchemy
Automatic evals for LLMs
☆509Updated last month
open-thought / system-2-research
System 2 Reasoning Link Collection
☆849Updated 4 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆283Updated this week