jackfsuia / nanoRLHFLinks
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆74Updated 8 months ago
Alternatives and similar repositories for nanoRLHF
Users that are interested in nanoRLHF are comparing it to the libraries listed below
Sorting:
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated 7 months ago
- ☆130Updated last year
- ☆83Updated 2 months ago
- ☆65Updated 11 months ago
- ☆96Updated 10 months ago
- ☆104Updated 10 months ago
- On Memorization of Large Language Models in Logical Reasoning☆72Updated 7 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆152Updated 10 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- ☆33Updated 4 months ago
- ☆107Updated 3 months ago
- ☆34Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆257Updated 10 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆194Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆247Updated 6 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆60Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆190Updated last year
- A research repo for experiments about Reinforcement Finetuning☆52Updated 6 months ago
- Fantastic Data Engineering for Large Language Models☆91Updated 10 months ago
- ☆161Updated 9 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆45Updated last year
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆136Updated 6 months ago
- ☆153Updated 11 months ago
- ☆83Updated last year
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆220Updated 3 months ago
- ☆118Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆180Updated 4 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆266Updated last year
- Counting-Stars ( ★)☆83Updated 4 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆137Updated 5 months ago