thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆86Updated 2 years ago
Alternatives and similar repositories for minRLHF:
Users that are interested in minRLHF are comparing it to the libraries listed below
- ☆96Updated last year
- RLHF implementation details of OAI's 2019 codebase☆178Updated last year
- ☆133Updated 2 months ago
- ☆160Updated last year
- Simple next-token-prediction for RLHF☆222Updated last year
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Updated last year
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆203Updated last year
- ☆95Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆72Updated 8 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆204Updated 9 months ago
- Self-Alignment with Principle-Following Reward Models☆154Updated 11 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆130Updated 9 months ago
- Unofficial implementation of AlpaGasus☆90Updated last year
- ☆95Updated 4 months ago
- ☆82Updated 4 months ago
- A repository for transformer critique learning and generation☆88Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 5 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD☆23Updated 2 years ago
- A toolkit for scaling law research ⚖☆47Updated 3 weeks ago
- ☆22Updated last year
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆65Updated 6 months ago
- ☆130Updated 2 months ago
- [ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks☆52Updated last year
- ☆105Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆132Updated 10 months ago
- ☆117Updated 4 months ago