raghavc / LLM-RLHF-Tuning-with-PPO-and-DPO
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
β138Updated 11 months ago
Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO:
Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ257Updated 8 months ago
- πΎ OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.β186Updated last week
- β92Updated 3 weeks ago
- RewardBench: the first evaluation tool for reward models.β504Updated this week
- "Improving Mathematical Reasoning with Process Supervision" by OPENAIβ103Updated last week
- β251Updated last year
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β295Updated 2 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"β462Updated 11 months ago
- A series of technical report on Slow Thinking with LLMβ398Updated last week
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]β183Updated 2 months ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancementβ173Updated 10 months ago
- β305Updated 8 months ago
- From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)β634Updated 3 months ago
- Controlled Text Generation via Language Model Arithmeticβ214Updated 5 months ago
- Official repository for ORPOβ435Updated 8 months ago
- This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.β117Updated 3 months ago
- β304Updated 5 months ago
- Notes and commented code for RLHF (PPO)β69Updated 11 months ago
- β316Updated last week
- An implemtation of Everyting of Thoughts (XoT).β139Updated 11 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Modelβ110Updated 2 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)β172Updated 4 months ago
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ372Updated 3 months ago
- We present the first systematic study on the scaling property of raw agents instantiated by LLMs. We find that performance scales with thβ¦β102Updated 4 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024β119Updated 3 months ago
- β106Updated 3 weeks ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generationβ295Updated 3 months ago
- β256Updated 6 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningβ348Updated 5 months ago