hkproj / rlhf-ppo
Notes and commented code for RLHF (PPO)
☆69Updated 11 months ago
Alternatives and similar repositories for rlhf-ppo:
Users that are interested in rlhf-ppo are comparing it to the libraries listed below
- Direct Preference Optimization from scratch in PyTorch☆80Updated last year
- ☆132Updated 2 months ago
- augmented LLM with self reflection☆111Updated last year
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆152Updated this week
- A brief and partial summary of RLHF algorithms.☆93Updated 2 months ago
- Survey of Small Language Models from Penn State, ...☆156Updated last month
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆155Updated 3 weeks ago
- ☆92Updated last month
- ☆130Updated 2 months ago
- ☆95Updated 7 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆289Updated 6 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆172Updated 9 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆173Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆95Updated 4 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆114Updated 7 months ago
- ☆88Updated last month
- ☆98Updated 2 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆216Updated this week
- Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`☆159Updated 2 months ago
- A series of technical report on Slow Thinking with LLM☆409Updated last week
- Critique-out-Loud Reward Models☆51Updated 4 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆55Updated 2 months ago
- ☆115Updated 8 months ago
- A curated list of Large Language Model with RAG☆78Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆100Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆215Updated 3 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆111Updated 3 months ago