li-plus/nanoRLHF

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/li-plus/nanoRLHF)

li-plus / nanoRLHF

Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)

☆18

Alternatives and similar repositories for nanoRLHF

Users that are interested in nanoRLHF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

taishan1994 / chinese_llm_sft
View on GitHub
使用指令微调对大模型进行微调。
☆11Jun 28, 2023Updated 3 years ago
omotolani12 / Building-an-Advanced-RAG-Chatbot-with-Knowledge-Graphs
View on GitHub
☆12Jun 12, 2024Updated 2 years ago
xubodhu / VisualPT-MoE
View on GitHub
☆10Mar 30, 2023Updated 3 years ago
ZhaXionghui / llama3-from-scratch-numpy_and_lora-fine-tune
View on GitHub
使用numpy从零开始实现llama3的推理流程，并对其进行封装，对比GPU,CPU上的表现以及Lora微调。llama3 implemented from scratch using numpy and lora fine-tune.。
☆11Jul 16, 2024Updated 2 years ago
cydu24 / HER
View on GitHub
☆23Jan 30, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xubodhu / RDS
View on GitHub
☆13Sep 27, 2022Updated 3 years ago
sherwyn11 / Social-Distancing-Analyzer
View on GitHub
Social Distancing Analyzer using OpenCV and YOLO
☆10Aug 30, 2024Updated last year
yyaghoobzadeh / figment
View on GitHub
FIGMENT
☆15Jan 27, 2020Updated 6 years ago
KTH-Nek5000 / PipeMesh
View on GitHub
gmsh meshing for pipes
☆10Oct 21, 2021Updated 4 years ago
Harry-Chan / seq2seqlm-on-qg
View on GitHub
☆13Feb 9, 2022Updated 4 years ago
IndoNLP / cendol
View on GitHub
Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages
☆18Jun 24, 2024Updated 2 years ago
yuchenlin / ParaGEN
View on GitHub
Neural Paraphrase Generation based on OpenNMT-py
☆12Jan 2, 2018Updated 8 years ago
Purshow / Awesome-LVLM-Hallucination
View on GitHub
☆56Nov 26, 2024Updated last year
rllab-snu / Spectral-Risk-Constrained-RL
View on GitHub
Official Github Repository for "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees". (NeurIPS 2024)
☆11Nov 30, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
MIT-REALM / efppo
View on GitHub
☆11Mar 5, 2024Updated 2 years ago
yanqiangmiffy / amp-pytorch
View on GitHub
Pytorch自动混合精度训练模板
☆18Apr 6, 2022Updated 4 years ago
DianaPajon / tiger
View on GitHub
A Haskell implementation of the tiger compiler
☆10May 2, 2020Updated 6 years ago
vast-ai / vast-pyworker
View on GitHub
☆12May 20, 2025Updated last year
TengFeiHan0 / Object-Detection.pytorch
View on GitHub
This repo consist of some experimental results on bdd100k datasets using different object detection algorithms(Faster-RCNN, FCOS, ATSS)
☆11Jun 27, 2020Updated 6 years ago
SimonZhan-code / Step-Wise_SafeRL_Pixel
View on GitHub
Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"
☆11Apr 5, 2024Updated 2 years ago
mahaitongdae / Safety_Index_Synthesis
View on GitHub
Code for L4DC 2022 paper: Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning.
☆15Jul 31, 2023Updated 2 years ago
w11wo / nlp-datasets
View on GitHub
A collection of various NLP datasets, mainly Indonesia-related languages.
☆15Apr 23, 2022Updated 4 years ago
YdrMaster / llama2.rs
View on GitHub
实验：rust 实现 llama2 推理
☆17Feb 23, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
BrianPulfer / vision-retention-networks
View on GitHub
Unofficial reimplementation of ViR: Vision Retention Networks by Hatamizadeh et. al. (https://arxiv.org/abs/2310.19731)
☆18Jul 26, 2024Updated last year
nt591 / monkey-ocaml
View on GitHub
Ocaml code from Writing an Interpreter in Go
☆11Aug 16, 2019Updated 6 years ago
seraj94ai / vehicle-tracking
View on GitHub
use simple image processing to detect cars in videos
☆11Sep 7, 2018Updated 7 years ago
intelligent-control-lab / Implicit_Safe_Set_Algorithm
View on GitHub
☆15Aug 7, 2025Updated 11 months ago
rayandrew / indonesian-image-captioning
View on GitHub
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
☆13Jul 31, 2019Updated 6 years ago
LuminosityX / MM-Forecast
View on GitHub
Implementation of our paper, "MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models".
☆18Apr 16, 2025Updated last year
UCSB-AI / SafeKey
View on GitHub
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
☆16May 12, 2026Updated 2 months ago
negin513 / distributed-pytorch-hpc
View on GitHub
Example workflows for executing multi-node, multi-GPU machine learning training using PyTorch on NCAR's HPC Supercomputer (Derecho).
☆16May 6, 2026Updated 2 months ago
RedSearchAgent / DeepTraceHub
View on GitHub
RedSearcher's framework for deep search agent trajectory synthesis, QA filtering, and model evaluation, supporting ReACT and DeepSeek-sty…
☆23Feb 26, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
liu-hz18 / Prompt-GLM
View on GitHub
Prompt Fine-tuning on GLM, BART and Flan-T5.
☆21Jan 20, 2023Updated 3 years ago
protozis / LBM_CYMB
View on GitHub
Lattice Boltzmann Method for multiple moving cylinders in C and OpenCL.
☆10Jan 2, 2022Updated 4 years ago
kangfend / bahasa
View on GitHub
Natural language toolkit for Indonesian Language (Bahasa)
☆20May 16, 2024Updated 2 years ago
lawreyios / GetRandomDog
View on GitHub
☆15Apr 5, 2020Updated 6 years ago
bupticybee / gym_chinese_chess
View on GitHub
中国象棋gym环境
☆15May 25, 2020Updated 6 years ago
kkdai / raftserver
View on GitHub
A RPC Server implement base on Raft Paper in Golang
☆10Jun 17, 2016Updated 10 years ago
nwtnni / tigerc
View on GitHub
Compiler for the Tiger programming language
☆12Oct 27, 2018Updated 7 years ago