raghavc / LLM-RLHF-Tuning-with-PPO-and-DPO
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
☆118Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-RLHF-Tuning-with-PPO-and-DPO
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated 2 weeks ago
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆131Updated 7 months ago
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago
- This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.☆110Updated last week
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆124Updated 4 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆139Updated this week
- RewardBench: the first evaluation tool for reward models.☆436Updated 3 weeks ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆270Updated 3 weeks ago
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆126Updated 5 months ago
- ☆116Updated 5 months ago
- The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]☆131Updated last week
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆448Updated 8 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated 2 months ago
- ☆287Updated 2 months ago
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆100Updated 2 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆208Updated last week
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆156Updated 7 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆162Updated last month
- Reformatted Alignment☆112Updated last month
- Official repository for ORPO☆421Updated 5 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆191Updated last month
- ☆175Updated this week
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆42Updated 3 months ago
- A bagel, with everything.☆312Updated 7 months ago
- ☆38Updated 8 months ago
- Evaluating LLMs with fewer examples☆135Updated 7 months ago
- AWM: Agent Workflow Memory☆208Updated last month
- ☆137Updated 6 months ago