jackfsuia / nanoRLHF
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆50Updated last month
Alternatives and similar repositories for nanoRLHF:
Users that are interested in nanoRLHF are comparing it to the libraries listed below
- ☆60Updated 4 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆67Updated last week
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated 11 months ago
- ☆98Updated 6 months ago
- ☆34Updated last month
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆72Updated last year
- On Memorization of Large Language Models in Logical Reasoning☆60Updated this week
- ☆30Updated 6 months ago
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆35Updated 10 months ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆55Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆125Updated 3 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆121Updated 8 months ago
- ☆101Updated 3 months ago
- ☆171Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆130Updated last month
- Deepseek R1 zero tiny version own reproduce on two A100s.☆54Updated 2 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆95Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆36Updated last year
- ☆37Updated 3 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated last year
- Repository of LV-Eval Benchmark☆61Updated 7 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆180Updated last month
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆30Updated 10 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- ☆61Updated 4 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆90Updated last week
- Hammer: Robust Function-Calling for On-Device Language Models via Function Masking☆63Updated last month
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆46Updated 3 months ago