RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆79Feb 19, 2025Updated last year
Alternatives and similar repositories for nanoRLHF
Users that are interested in nanoRLHF are comparing it to the libraries listed below
Sorting:
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- ☆13Sep 12, 2024Updated last year
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- ☆14Mar 29, 2020Updated 5 years ago
- pip install poai☆14Mar 2, 2026Updated last week
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing.☆14Nov 8, 2023Updated 2 years ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- A simple implementation of ReasonGenRM.☆19Apr 21, 2025Updated 10 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 8 months ago
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆16Sep 27, 2023Updated 2 years ago
- ☆11Updated this week
- accelerate generating vector by using onnx model☆18Jan 23, 2024Updated 2 years ago
- ☆21Mar 23, 2022Updated 3 years ago
- Minimal hackable GRPO implementation☆328Jan 31, 2025Updated last year
- Source code for OSDI 2023 paper titled "Cilantro - Performance-Aware Resource Allocation for General Objectives via Online Feedback"☆40Jul 6, 2023Updated 2 years ago
- private-machine is an AI companion system with emotion, needs and goals simulation. Very silly, not based on real science.☆30Feb 26, 2026Updated last week
- WATERMELON: Multi-Agent Reinforcement Learning Based Algorithmic Stock Trading System with GUI Application☆17Sep 8, 2022Updated 3 years ago
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆19May 27, 2020Updated 5 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization☆22Mar 12, 2025Updated 11 months ago
- The objective of this project is to demonstrate how to fine-tune deepseek-r1-distill-llama-8b.☆16Feb 19, 2025Updated last year
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated 10 months ago
- [NeurIPS '24] Code repo for the paper entitled "Learning Structured Representations with Hyperbolic Embeddings" at NeurIPS 2024☆24Jan 22, 2025Updated last year
- [NeurIPS25 Spotlight] EMPO, A Fully Unsupervised RLVR Method☆97Nov 24, 2025Updated 3 months ago
- ☆20Jun 3, 2023Updated 2 years ago
- ☆17Nov 1, 2024Updated last year
- 通用数字人系统是一个基于深度学习和WebRTC技术的智能交互平台,集成了Azure Avatar数 字人渲染、语音识别合成、自然语言处理等技术。系统支持实时对话、知识问答和情感交互,可实现30FPS以上的流畅渲染和200ms以内的低延迟响应。核心功能包括基于GPT的智能对话、…☆28Dec 17, 2025Updated 2 months ago
- Chinese rule based relation extraction☆15Feb 19, 2019Updated 7 years ago
- *flow source code☆23Aug 27, 2020Updated 5 years ago
- FusionFS: Fusing I/O Operations using CISCOps in Firmware File Systems, FAST '22☆19Apr 10, 2022Updated 3 years ago
- Getting Starting with NIMBUS-CORE☆10Dec 16, 2023Updated 2 years ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 4 months ago
- ☆51Jun 14, 2024Updated last year
- [ICRA 2024] Official Implementation of the paper "Parameter-efficient Prompt Learning for 3D Point Cloud Understanding"☆28Feb 24, 2025Updated last year
- ☆19Jul 1, 2020Updated 5 years ago
- A fluent, scalable, and easy-to-use LLM data processing framework.☆28Jan 31, 2026Updated last month
- A minimalist benchmarking tool designed to test the routine-generation capabilities of LLMs.☆27Nov 28, 2024Updated last year
- 由中国政法大学和北京航空航天大学共同设计,基于GLM-9B的法律文书处理和判决预测模型☆29Sep 6, 2024Updated last year