Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
☆188Feb 24, 2026Updated last month
Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO
Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A lightweight, dependency-free (besides `libcurl`) command-line tool written in C to download the transcript of any YouTube video. It dir…☆22Aug 25, 2025Updated 7 months ago
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year
- Notes and commented code for RLHF (PPO)☆127Feb 27, 2024Updated 2 years ago
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- Official Implementation of "Probing Language Models for Pre-training Data Detection"☆20Dec 4, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆41Jul 6, 2025Updated 9 months ago
- Source code for ICLR 2021 paper: "Molecule Optimization by Explainable Evolution"☆30May 29, 2021Updated 4 years ago
- Fine-tune LLM agents with online reinforcement learning☆1,250Mar 19, 2024Updated 2 years ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆868Jan 15, 2024Updated 2 years ago
- Action library for AI Agent☆228Mar 31, 2025Updated last year
- An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST☆11Nov 19, 2022Updated 3 years ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆79Feb 19, 2025Updated last year
- ☆13Jul 2, 2025Updated 9 months ago
- An expression parser supporting multiple types☆21Sep 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆57Apr 23, 2024Updated last year
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆13Aug 2, 2024Updated last year
- ☆284Jan 6, 2025Updated last year
- A comprehensive paper list of Reasoning over Tables.☆30Nov 6, 2022Updated 3 years ago
- Reference implementation of Megalodon 7B model☆527May 17, 2025Updated 10 months ago
- Example code for the NNGeometry PyTorch library☆10Aug 20, 2025Updated 7 months ago
- ☆17Jun 10, 2025Updated 10 months ago
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 6 months ago
- ☆22Oct 5, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Feb 5, 2025Updated last year
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆56Nov 1, 2022Updated 3 years ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆115Jan 6, 2024Updated 2 years ago
- Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023☆17Sep 27, 2023Updated 2 years ago
- Examples of MolScore implementations☆12May 30, 2024Updated last year
- A barely barebone NumPy implementation of Hierarchical Temporal Memory.☆11Mar 26, 2023Updated 3 years ago
- ☆14Nov 12, 2024Updated last year
- Safe Python Code Execution Environment for Language Models☆17Mar 27, 2026Updated last week
- Reaching LLaMA2 Performance with 0.1M Dollars☆988Jul 23, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- AI Travel Planner App Tutorial from TubeGuruji.☆10Mar 15, 2025Updated last year
- 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.☆81Mar 17, 2022Updated 4 years ago
- MLIR backend for Nx☆14May 24, 2024Updated last year
- ☆16Apr 4, 2025Updated last year
- Official implementation of "BERTs are Generative In-Context Learners"☆32Mar 14, 2025Updated last year
- Implementation of OpenAI paper with Simple Noise Scale on Fastai V2☆19Apr 16, 2021Updated 4 years ago
- Offline-first, decentralized graph database of collaborative Web apps☆15May 12, 2017Updated 8 years ago