natolambert / rlhf-bookLinks
Textbook on reinforcement learning from human feedback
☆1,052Updated last week
Alternatives and similar repositories for rlhf-book
Users that are interested in rlhf-book are comparing it to the libraries listed below
Sorting:
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,438Updated 2 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,554Updated 3 weeks ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,434Updated 5 months ago
- Recipes to scale inference-time compute of open models☆1,097Updated last month
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆476Updated last month
- A reading list on LLM based Synthetic Data Generation 🔥☆1,310Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆1,328Updated this week
- Training Large Language Model to Reason in a Continuous Latent Space☆1,162Updated 5 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆991Updated last month
- Awesome Reasoning LLM Tutorial/Survey/Guide☆1,781Updated last week
- Democratizing Reinforcement Learning for LLMs☆3,396Updated last month
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,641Updated this week
- nanoGPT style version of Llama 3.1☆1,386Updated 10 months ago
- A bibliography and survey of the papers surrounding o1☆1,201Updated 7 months ago
- Synthetic data curation for post-training and structured data extraction☆1,414Updated last week
- Minimalistic large language model 3D-parallelism training☆1,942Updated this week
- System 2 Reasoning Link Collection☆838Updated 3 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆519Updated last week
- AllenAI's post-training codebase☆3,028Updated this week
- Model Activity Visualiser☆506Updated 2 months ago
- Everything about the SmolLM2 and SmolVLM family of models☆2,590Updated 2 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,016Updated 3 weeks ago
- ☆1,025Updated 6 months ago
- procedural reasoning datasets☆872Updated last week
- ☆668Updated last month
- LIMO: Less is More for Reasoning☆963Updated 2 months ago
- Code for BLT research paper☆1,686Updated last month
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆801Updated 2 weeks ago
- Large Concept Models: Language modeling in a sentence representation space☆2,233Updated 4 months ago
- Fast State-of-the-Art Static Embeddings☆1,740Updated 3 weeks ago