raghavc / LLM-RLHF-Tuning-with-PPO-and-DPOView external linksLinks
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
☆183Mar 18, 2024Updated last year
Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO
Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below
Sorting:
- ROS node to interface cnc machines using GRBL as GCODE interpreter☆11Jun 25, 2019Updated 6 years ago
- Apache Hive Metastore in Standalone Mode With Docker☆14Jul 22, 2024Updated last year
- Messing with Postgres query execution and hook infrastructure.☆13Nov 19, 2023Updated 2 years ago
- ☆16Apr 4, 2025Updated 10 months ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆867Jan 15, 2024Updated 2 years ago
- An expression parser supporting multiple types☆21Sep 25, 2024Updated last year
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- Example showing how to embed a directory of files in a zig executable☆22Oct 24, 2025Updated 3 months ago
- Fine-tune LLM agents with online reinforcement learning☆1,246Mar 19, 2024Updated last year
- Minimal AlphaZero in PyTorch, trained on Connect4 on a 6x6 board.☆21Aug 12, 2022Updated 3 years ago
- ☆17May 19, 2023Updated 2 years ago
- Simple orchestration for EC2 spot containers☆19Sep 27, 2024Updated last year
- SyncLite : Build Anything Sync Anywhere☆156Dec 28, 2025Updated last month
- Scripts to create your own moe models using mlx☆90Feb 26, 2024Updated last year
- Reference implementation of Megalodon 7B model☆528May 17, 2025Updated 9 months ago
- Action library for AI Agent☆227Mar 31, 2025Updated 10 months ago
- Framework-Agnostic RL Environments for LLM Fine-Tuning☆42Feb 9, 2026Updated last week
- Data about 349K OpenAI Custom GPTs☆149Apr 29, 2024Updated last year
- Implementation of go-diff's diffmatchpatch in Zig☆29Feb 7, 2026Updated last week
- ☆163Jul 2, 2024Updated last year
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆8,989Feb 6, 2026Updated last week
- LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.☆671Aug 22, 2025Updated 5 months ago
- ai for jq☆249Sep 20, 2024Updated last year
- ☆23Mar 25, 2025Updated 10 months ago
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆112Jan 6, 2024Updated 2 years ago
- Serverless Optimized MODules - A Serverless Framework to create reusable micro apps☆18Jul 7, 2025Updated 7 months ago
- LLM plugin for models hosted by Anyscale Endpoints☆35Apr 22, 2024Updated last year
- Source code for ICLR 2021 paper: "Molecule Optimization by Explainable Evolution"☆30May 29, 2021Updated 4 years ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆986Jul 23, 2024Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Feb 5, 2025Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Apr 17, 2025Updated 10 months ago
- A comprehensive template for aligning large language models (LLMs) using Reinforcement Learning from Human Feedback (RLHF), transfer lear…☆35Dec 15, 2024Updated last year
- The creative suite for character-driven AI experiences.☆190Sep 6, 2024Updated last year
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,296Dec 9, 2025Updated 2 months ago
- LMQL implementation of tree of thoughts☆36Jan 31, 2024Updated 2 years ago
- Test-Time Memory Framework: Control Hallucinations in Foundation Models☆11Nov 4, 2025Updated 3 months ago
- ☆17Sep 1, 2024Updated last year
- Read, modify and write DICOS files with python code☆12Nov 24, 2025Updated 2 months ago
- Efficient vector database for hundred millions of embeddings.☆211May 17, 2024Updated last year