raghavc / LLM-RLHF-Tuning-with-PPO-and-DPOLinks
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
☆156Updated last year
Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO
Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below
Sorting:
- Official repository for ORPO☆453Updated last year
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆367Updated last week
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆129Updated 3 months ago
- ☆114Updated 4 months ago
- RewardBench: the first evaluation tool for reward models.☆590Updated this week
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆206Updated 2 years ago
- ☆315Updated 8 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆148Updated 3 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆464Updated last year
- Controlled Text Generation via Language Model Arithmetic☆221Updated 8 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆272Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- ☆141Updated 6 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆161Updated 11 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆307Updated 3 months ago
- This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.☆128Updated 6 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆183Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 3 weeks ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆462Updated last year
- ☆97Updated 11 months ago
- ☆142Updated last year
- ☆309Updated 11 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- Tina: Tiny Reasoning Models via LoRA☆245Updated last week
- ☆124Updated 11 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆102Updated 4 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆493Updated 4 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆207Updated last month
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆187Updated 10 months ago