haohaoXhang/RLHF_learn

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/haohaoXhang/RLHF_learn)

haohaoXhang / RLHF_learn

这是一个从零开始构建的强化学习人类反馈（RLHF）学习代码库，实现了 PPO、GRPO、GSPO 以及相关的策略优化算法，并提供了清晰、可复现的训练流程。由于文档是由latex文件转译过来，如果md文件渲染异常，请用VScode的md插件打开

☆90

Alternatives and similar repositories for RLHF_learn

Users that are interested in RLHF_learn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dong845 / MCP-MedSAM
View on GitHub
☆12Nov 13, 2025Updated 7 months ago
101yang101 / CZY_ChatBot
View on GitHub
本项目是一个基于LangChain构建的多Agent系统，结合Streamlit实现的Web界面，能够根据用户输入进行网络搜索并提供旅游相关的聊天服务。此外，该系统还具备基于本地知识库的推销功能，为用户提供个性化的旅游产品推荐。
☆15Apr 20, 2025Updated last year
windrider / cs149_asst1
View on GitHub
c++ 实现stanford cs149 assignment1
☆14Feb 19, 2023Updated 3 years ago
amro-kamal / ObjectPose
View on GitHub
☆13Jul 19, 2022Updated 3 years ago
qq749812679 / Multi-Modal-AI-Orchestrator
View on GitHub
Make one prompt become an immersive, production‑ready experience: a single pipeline for Text → Image → Music → Lights → Video, with real …
☆74Sep 5, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Oliver-242 / HUST-Embedded-System
View on GitHub
华中科技大学2020级嵌入式系统
☆19Dec 1, 2022Updated 3 years ago
pihang / LLM_Learning_ph
View on GitHub
从零预训练LLM、SFT、RLHF、DPO笔记整理+面试问题
☆21Sep 2, 2024Updated last year
zhangzg1 / rag_with_chat
View on GitHub
基于RAG的知识问答系统，主要结合了 LLM、Langchain、提示工程、优化知识库结构和检索生成流程、vllm 推理优化框架等技术
☆24Mar 12, 2025Updated last year
MadryLab / journey-TRAK
View on GitHub
Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
☆25Dec 12, 2023Updated 2 years ago
howarlii / material.SCUTSE
View on GitHub
华南理工大学软件学院历年考试资料
☆16Dec 6, 2021Updated 4 years ago
lixx-backup / RAG-Retrieval-Augmented-Generation-
View on GitHub
A lab to practice RAG techniques.
☆44Sep 7, 2025Updated 10 months ago
xiaohuiduan / network-traffic-dataset
View on GitHub
this is dataset about network traffic
☆21Mar 5, 2021Updated 5 years ago
snakers4 / mnasnet-pytorch
View on GitHub
A PyTorch implementation of MNASNET
☆47Sep 3, 2018Updated 7 years ago
theaifutureguy / AI-Lawyer---RAG-with-DeepSeek-R1
View on GitHub
AI-powered legal chatbot that leverages Retrieval-Augmented Generation (RAG) with DeepSeek R1 for advanced legal reasoning and document …
☆36Jun 20, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
laura-ham / HM-Fashion-image-neural-search
View on GitHub
H&M Fashion Image similarity search with Weaviate and DocArray
☆43Mar 18, 2024Updated 2 years ago
kjaved0 / awesome-continual-learning
View on GitHub
A repository to keep track of literature on catastrophic forgetting
☆37Mar 10, 2020Updated 6 years ago
KuRRe8 / eeemsc-coursework
View on GitHub
NTU EEE postgraduate 项目笔记作业习题答案分享
☆116May 10, 2025Updated last year
emecercelik / ssl-3d-detection
View on GitHub
☆41Nov 1, 2023Updated 2 years ago
shemhamforash23 / lightrag-mcp
View on GitHub
☆117Jun 8, 2026Updated last month
LAMDA-NeSy / Lab-RoadMap
View on GitHub
面向新同学进组的学习指南
☆186Apr 26, 2026Updated 2 months ago
Grandzxw / TripleMixer
View on GitHub
[TIP 2025] TripleMixer: A Triple-Domain Mixing Model for Point Cloud Denoising under Adverse Weather
☆78Apr 14, 2026Updated 2 months ago
huangyf2013320506 / magic_conch_backend
View on GitHub
☆173Feb 25, 2025Updated last year
FangXiuwen / Robust_FL
View on GitHub
☆53Dec 31, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
datawhalechina / hello-generic-agent
View on GitHub
📚 《Generic Agent使用指南》——轻松上手自进化智能体，从基础调用到高级技巧全覆盖
☆535May 12, 2026Updated last month
adongwanai / LLM-Resume-Template
View on GitHub
专业的 LaTeX 简历模板，专为大模型与 Agent 算法工程师设计 | Professional LaTeX resume template for LLM & Agent algorithm engineers
☆338Dec 16, 2025Updated 6 months ago
ztmzzz / UESTC_course_data
View on GitHub
这是我的学习过程中自己整理的资料，实验报告等。电子科技大学大数据作业答案实验报告
☆67Nov 12, 2025Updated 7 months ago
Doragd / PaperReading
View on GitHub
Paper阅读记录博客（基于GitHub Action和GitHub Issue实现）。
☆61Sep 19, 2025Updated 9 months ago
FutureUnreal / What-to-eat-today
View on GitHub
🍽️基于图RAG技术的AI美食推荐助手 - Datawhale all-in-rag教程实战案例，集成Neo4j图数据库、Milvus向量检索与智能对话系统
☆198Feb 6, 2026Updated 5 months ago
kaymen99 / local-rag-researcher-deepseek
View on GitHub
Local RAG researcher agent built using Langgraph, DeepSeek R1 and Ollama
☆142Feb 13, 2025Updated last year
HuangCongQing / pcdet-note
View on GitHub
OpenPCDet 代码重点注解笔记
☆73Jun 20, 2024Updated 2 years ago
ifwind / code_framework_pytorch
View on GitHub
一份pytorch模型训练框架，方便快速设计和开始训练一个模型
☆70Mar 21, 2022Updated 4 years ago
BTDLOZC-SJTU / CompetitionTianChi_newsRecommendation
View on GitHub
天池大赛——新闻推荐场景下的用户行为预测挑战赛，SOLO赛，B榜排名5/5338
☆78Mar 16, 2021Updated 5 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
liujunwen23 / MIRE
View on GitHub
WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge
☆132Nov 11, 2024Updated last year
illume-unified-mllm / ILLUME_plus
View on GitHub
[CVPR2025] Official Implementation of ILLUME+
☆126Aug 20, 2025Updated 10 months ago
blakechen97 / SASA
View on GitHub
SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection
☆94Feb 18, 2022Updated 4 years ago
CS-BAOYAN / CS-BAOYAN-2022
View on GitHub
☆83Mar 15, 2023Updated 3 years ago
wolf-bailang / AI-Projects
View on GitHub
AI项目（强化学习、深度学习、计算机视觉、推荐系统、自然语言处理、机器导航、医学影像处理）
☆93Aug 8, 2023Updated 2 years ago
JinkyuKimUCB / BDD-X-dataset
View on GitHub
Berkeley Deep Drive-X (eXplanation) dataset
☆132Jan 18, 2019Updated 7 years ago
WenkeHuang / RethinkFL
View on GitHub
CVPR2023 - Rethinking Federated Learning with Domain Shift: A Prototype View
☆118Dec 29, 2024Updated last year