thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆86Updated 2 years ago
Alternatives and similar repositories for minRLHF:
Users that are interested in minRLHF are comparing it to the libraries listed below
- ☆96Updated last year
- RLHF implementation details of OAI's 2019 codebase☆166Updated last year
- ☆125Updated last month
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Updated 11 months ago
- ☆161Updated last year
- A repository for transformer critique learning and generation☆88Updated last year
- Simple next-token-prediction for RLHF☆222Updated last year
- ☆93Updated 6 months ago
- ☆81Updated this week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆124Updated 9 months ago
- ☆30Updated 2 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 7 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆102Updated 6 months ago
- Self-Alignment with Principle-Following Reward Models☆150Updated 10 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆100Updated this week
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆71Updated 7 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆171Updated 3 months ago
- A toolkit for scaling law research ⚖☆43Updated last month
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆111Updated 2 months ago
- ☆93Updated 3 months ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆203Updated last year
- ☆89Updated this week
- ☆115Updated 3 months ago
- ☆119Updated last month
- ☆75Updated 6 months ago
- ☆265Updated last week
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆27Updated 7 months ago
- Reproducible, flexible LLM evaluations☆118Updated last month
- DSIR large-scale data selection framework for language model training☆242Updated 9 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆124Updated 6 months ago