thomfoster / minRLHFLinks
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆86Updated 2 years ago
Alternatives and similar repositories for minRLHF
Users that are interested in minRLHF are comparing it to the libraries listed below
Sorting:
- ☆98Updated 2 years ago
- ☆152Updated 11 months ago
- RLHF implementation details of OAI's 2019 codebase☆191Updated last year
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Updated last year
- ☆159Updated 2 years ago
- Simple next-token-prediction for RLHF☆226Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆168Updated last month
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆209Updated 2 years ago
- Implementation of Reinforcement Learning from Human Feedback (RLHF)☆172Updated 2 years ago
- A repository for transformer critique learning and generation☆88Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆217Updated 2 years ago
- A minimum example of aligning language models with RLHF similar to ChatGPT☆223Updated 2 years ago
- RL algorithm: Advantage induced policy alignment☆65Updated 2 years ago
- DSIR large-scale data selection framework for language model training☆261Updated last year
- ☆100Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated last year
- Scaling Data-Constrained Language Models☆342Updated 3 months ago
- ☆242Updated 2 years ago
- ☆128Updated last year
- ☆106Updated 3 months ago
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆56Updated 2 years ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆201Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆57Updated last year
- Plug in and play implementation of " Textbooks Are All You Need", ready for training, inference, and dataset generation☆73Updated 2 years ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆140Updated 8 months ago
- Pre-training code for Amber 7B LLM☆168Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆80Updated last year
- ☆179Updated 2 years ago
- ☆280Updated 9 months ago