thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆86Updated last year
Related projects ⓘ
Alternatives and complementary repositories for minRLHF
- ☆94Updated last year
- RLHF implementation details of OAI's 2019 codebase☆152Updated 10 months ago
- ☆114Updated 4 months ago
- ☆158Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆105Updated 7 months ago
- ☆90Updated 4 months ago
- Chain-of-Hindsight, A Scalable RLHF Method☆220Updated last year
- A repository for transformer critique learning and generation☆86Updated 11 months ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆177Updated 9 months ago
- ☆112Updated last month
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆199Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆107Updated 4 months ago
- DSIR large-scale data selection framework for language model training☆230Updated 7 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆80Updated last week
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆138Updated 2 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 5 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆219Updated 2 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆83Updated last week
- A pipeline to improve skills of large language models☆191Updated this week
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆213Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 5 months ago
- ☆73Updated 4 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆151Updated 11 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆97Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆58Updated 3 months ago
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- A toolkit for scaling law research ⚖☆43Updated 8 months ago
- ☆72Updated 5 months ago