jackfsuia / nanoRLHFView external linksLinks
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆79Feb 19, 2025Updated 11 months ago
Alternatives and similar repositories for nanoRLHF
Users that are interested in nanoRLHF are comparing it to the libraries listed below
Sorting:
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- ☆13Sep 12, 2024Updated last year
- pip install poai☆14Jul 7, 2025Updated 7 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing.☆15Nov 8, 2023Updated 2 years ago
- Reading seminar in Harvard Cloud Networking and Systems Group☆16Aug 29, 2022Updated 3 years ago
- Finding bugs in P4 compilers using translation validation.☆38Nov 4, 2025Updated 3 months ago
- accelerate generating vector by using onnx model☆18Jan 23, 2024Updated 2 years ago
- ☆11Feb 6, 2026Updated last week
- Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling☆16Sep 27, 2023Updated 2 years ago
- Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization☆21Mar 12, 2025Updated 11 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 7 months ago
- Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters with at Scale☆19May 27, 2020Updated 5 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated last year
- The objective of this project is to demonstrate how to fine-tune deepseek-r1-distill-llama-8b.☆16Feb 19, 2025Updated 11 months ago
- WATERMELON: Multi-Agent Reinforcement Learning Based Algorithmic Stock Trading System with GUI Application☆17Sep 8, 2022Updated 3 years ago
- [NeurIPS '24] Code repo for the paper entitled "Learning Structured Representations with Hyperbolic Embeddings" at NeurIPS 2024☆24Jan 22, 2025Updated last year
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated 9 months ago
- ☆20Jun 3, 2023Updated 2 years ago
- Getting Starting with NIMBUS-CORE☆10Dec 16, 2023Updated 2 years ago
- *flow source code☆23Aug 27, 2020Updated 5 years ago
- ☆19Jul 1, 2020Updated 5 years ago
- A fluent, scalable, and easy-to-use LLM data processing framework.☆28Jan 31, 2026Updated 2 weeks ago
- A minimalist benchmarking tool designed to test the routine-generation capabilities of LLMs.☆27Nov 28, 2024Updated last year
- Face recognition using Pytorch (Arcface, Cosface, Centerloss)☆22Nov 22, 2022Updated 3 years ago
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆25Nov 21, 2024Updated last year
- ☆26Aug 31, 2023Updated 2 years ago
- Random collections of my interested research papers / projects☆20May 20, 2021Updated 4 years ago
- ☆56Jan 25, 2021Updated 5 years ago
- ☆24Jul 7, 2024Updated last year
- [MLSys 2022] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node …☆56Oct 6, 2023Updated 2 years ago
- ☆55Apr 7, 2022Updated 3 years ago
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆28Apr 25, 2023Updated 2 years ago
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆28Feb 26, 2024Updated last year
- Reproduce R1 Zero on Logic Puzzle☆2,432Mar 20, 2025Updated 10 months ago
- Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …☆29Feb 23, 2025Updated 11 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- ☆124May 28, 2024Updated last year
- A simple deep learning framework inspired by Dezero and PyTorch☆31Jan 27, 2025Updated last year