natolambert / rlhf-book
Textbook on reinforcement learning from human feedback
☆894Updated last week
Alternatives and similar repositories for rlhf-book
Users that are interested in rlhf-book are comparing it to the libraries listed below
Sorting:
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,349Updated 4 months ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆450Updated this week
- Understanding R1-Zero-Like Training: A Critical Perspective☆925Updated last month
- Awesome Reasoning LLM Tutorial/Survey/Guide☆1,605Updated last month
- Verifiers for LLM Reinforcement Learning☆953Updated this week
- Recipes to scale inference-time compute of open models☆1,071Updated last week
- procedural reasoning datasets☆580Updated this week
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,328Updated 3 weeks ago
- A reading list on LLM based Synthetic Data Generation 🔥☆1,265Updated 2 months ago
- A bibliography and survey of the papers surrounding o1☆1,192Updated 6 months ago
- Training Large Language Model to Reason in a Continuous Latent Space☆1,109Updated 3 months ago
- Code for BLT research paper☆1,587Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,516Updated last week
- Synthetic data curation for post-training and structured data extraction☆1,324Updated this week
- System 2 Reasoning Link Collection☆833Updated 2 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆492Updated this week
- Build your own visual reasoning model☆362Updated this week
- LIMO: Less is More for Reasoning☆940Updated last month
- Minimal and annotated implementations of key ideas from modern deep learning research.☆524Updated last week
- nanoGPT style version of Llama 3.1☆1,367Updated 9 months ago
- Model Activity Visualiser☆477Updated last month
- OLMoE: Open Mixture-of-Experts Language Models☆746Updated 2 months ago
- Automatic evals for LLMs☆388Updated this week
- ☆1,019Updated 5 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,464Updated 2 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,700Updated this week
- Democratizing Reinforcement Learning for LLMs☆3,236Updated this week
- Atom of Thoughts for Markov LLM Test-Time Scaling☆563Updated this week
- AllenAI's post-training codebase☆2,950Updated this week
- Official Repo for Open-Reasoner-Zero☆1,916Updated last month