ash80/RLHF_in_notebooks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ash80/RLHF_in_notebooks)

ash80 / RLHF_in_notebooks

RLHF (Supervised fine-tuning, reward model, and PPO) step-by-step in 3 Jupyter notebooks

☆253

Alternatives and similar repositories for RLHF_in_notebooks

Users that are interested in RLHF_in_notebooks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nestordemeure / shelly
View on GitHub
An LLM based shell assistant that knows your usual shell commands.
☆17Jul 18, 2025Updated last year
ninehills / embedding_finetuning
View on GitHub
Fine-tuning embedding models.
☆14Nov 25, 2024Updated last year
bwasti / gt
View on GitHub
[experimental] multiplexed distributed tensor framework
☆22Nov 17, 2025Updated 8 months ago
PBDESG / nnViewer
View on GitHub
☆10Jan 23, 2025Updated last year
therealoliver / Deepdive-llama3-from-scratch
View on GitHub
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
☆632Feb 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yousef-rafat / miniDiffusion
View on GitHub
A reimplementation of Stable Diffusion 3.5 in pure PyTorch
☆706Jun 14, 2025Updated last year
ashworks1706 / rlhf-from-scratch
View on GitHub
A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from…
☆115Nov 7, 2025Updated 8 months ago
goyalpramod / Foundational-ML-papers
View on GitHub
Implementations of Papers that I read, you can read my breakdown in my blog
☆91Oct 23, 2025Updated 9 months ago
jmaczan / text-to-ml
View on GitHub
Programmable automated machine learning - proof of concept
☆15Oct 9, 2024Updated last year
AbdullahAbuHassann / GenerativeAICourse
View on GitHub
☆561Jul 1, 2025Updated last year
AIDajiangtang / LLM-from-scratch
View on GitHub
从零开始学大模型Transformer、GPT2、BERT pre-training and fine-tuning from scratch
☆41Jul 1, 2024Updated 2 years ago
ronething / m3u8player
View on GitHub
yet another m3u8 player
☆13Jun 8, 2025Updated last year
rasbt / reasoning-from-scratch
View on GitHub
Implement a reasoning LLM in PyTorch from scratch, step by step
☆4,829Updated this week
camenduru / LLaVA-OneVision-jupyter
View on GitHub
☆13Aug 12, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
haizelabs / thorn-in-haizestack
View on GitHub
Thorn in a HaizeStack test for evaluating long-context adversarial robustness.
☆26Aug 3, 2024Updated last year
0xbbdd / ShadowyCompression
View on GitHub
A JPEG Image Compression Service using Part Homomorphic Encryption.
☆31Mar 7, 2025Updated last year
genbs / poste-italiane-parser
View on GitHub
A Python tool to parse PDF statements from Poste Italiane (Postepay, BancoPosta) and extract data as structured JSON.
☆50Jul 25, 2025Updated last year
PaulPauls / llama3_interpretability_sae
View on GitHub
A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…
☆640Mar 23, 2025Updated last year
thisisanshgupta / Senna
View on GitHub
Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…
☆19Sep 5, 2024Updated last year
tobiaskauer / the-end-is-near
View on GitHub
Suffices of German town and village names
☆10May 4, 2020Updated 6 years ago
fzliu / radient
View on GitHub
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
☆281Mar 2, 2026Updated 4 months ago
multiplexerai / Chat-histor-RAG
View on GitHub
☆25Feb 18, 2024Updated 2 years ago
Khalil-Rehman9 / CaptionAI
View on GitHub
A powerful and user-friendly tool that generates detailed captions for your images
☆21Nov 11, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
kpetrovicc / TGR
View on GitHub
Temporal Graph Rewiring Method with Expander Graphs
☆12Oct 18, 2024Updated last year
gavinkhung / machine-learning-visualized
View on GitHub
ML algorithms implemented and derived from first-principles in Jupyter Notebooks and NumPy
☆1,750Jul 8, 2026Updated 3 weeks ago
disksing / fake-screenshot
View on GitHub
☆18Jul 7, 2024Updated 2 years ago
AaronFeng753 / Better-Qwen3
View on GitHub
Auto Thinking Mode switch for Qwen3 in Open webui
☆72May 8, 2025Updated last year
bertmaher / tf32_gemm
View on GitHub
Example of binding a TF32 CUTLASS GEMM kernel to PyTorch
☆12Jun 7, 2024Updated 2 years ago
Centrattic / global-cot-analysis
View on GitHub
Global CoT Analysis: Initial attempts to uncover patterns across many chains of thought
☆20Feb 10, 2026Updated 5 months ago
CelVoxes / ceLLama
View on GitHub
Cell type annotation with local Large Language Models (LLMs) - Ensuring privacy and speed with extensive customized reports
☆152Oct 25, 2024Updated last year
cfgranda / ps4ds
View on GitHub
Probability and Statistics for Data Science: A self-contained introduction to probability and statistics for data science, including a fr…
☆607Jul 18, 2026Updated last week
proj-airi / webai-examples
View on GitHub
🧠 Web AI / LLM in browser / Whisper in browser / WebGPU inference Examples
☆35Oct 1, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Zaki101Aslam / MS-office-shortcuts-for-Libre-Office
View on GitHub
Microsoft Office Shortcut keys for Libre Office to make it feel more familiar
☆42Jul 3, 2026Updated 3 weeks ago
jeremicna / deepdream-video-pytorch
View on GitHub
DeepDream for video with temporal consistency. Features RAFT optical flow estimation and occlusion masking to prevent ghosting. A PyTorch…
☆62Jul 2, 2026Updated 3 weeks ago
Fringe210 / llama.cpp-deepseek-v4-flash-cuda
View on GitHub
Experimental implementation of DeepSeek v4 flaash in llama.cpp
☆23Apr 30, 2026Updated 2 months ago
nirw4nna / dsc
View on GitHub
Tensor library & inference framework for machine learning
☆118Oct 3, 2025Updated 9 months ago
mlvanguards / custom-siri
View on GitHub
☆18Sep 8, 2025Updated 10 months ago
khoj-ai / terrarium
View on GitHub
A simple Python sandbox for helpful LLM data agents
☆16May 4, 2025Updated last year
langchain-ai / agents-from-scratch
View on GitHub
Build an email assistant with human-in-the-loop and memory
☆1,969Jun 15, 2026Updated last month