这是一个从零开始构建的强化学习人类反馈(RLHF)学习代码库,实现了 PPO、GRPO、GSPO 以及相关的策略优化算法,并提供了清晰、可复现的训练流程。由于文档是由latex文件转译过来,如果md文件渲染异常,请用VScode的md插件打开
☆77Dec 19, 2025Updated 2 months ago
Alternatives and similar repositories for RLHF_learn
Users that are interested in RLHF_learn are comparing it to the libraries listed below
Sorting:
- 本项目是一个基于LangChain构建的多Agent系统,结合Streamlit实现的Web界面,能够根据用户输入进行网络搜索并提供旅游相关的聊天服务。此外,该系统还具备基于本地知识库的推销功能,为用户提供个性化的旅游产品推荐。☆15Apr 20, 2025Updated 10 months ago
- ☆10Sep 30, 2024Updated last year
- ☆13Jul 19, 2022Updated 3 years ago
- Implementation of my agent used in 2025 AFAC TianChi competition☆28Oct 6, 2025Updated 5 months ago
- "简历匹配智能体"是一个AI驱动的平台,旨在逆向工程招聘算法,向您展示如何精准定制简历。获取那些能让你通过初步筛选、进入人工审阅阶段的关键词、格式和洞察。resume-matcher-agent意思就是模拟HR怎样筛选你的简历,提前给你展示筛选的结论,方便你尽快修改好你的简…☆53Sep 9, 2025Updated 5 months ago
- Official code for the paper "Understanding Co-speech Gestures in-the-wild"☆20Oct 31, 2025Updated 4 months ago
- [NeurIPS'25 Spotlight🔥]Official implementation of Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Ma…☆27Dec 13, 2025Updated 2 months ago
- ☆16Feb 23, 2025Updated last year
- Implement llm model in pytorch, support MoE and RoPE☆41Jan 29, 2026Updated last month
- A lab to practice RAG techniques.☆36Sep 7, 2025Updated 6 months ago
- 本仓库旨在记录和分享我在 LLM 和 Agent 领域的学习历程,并通过实践项目深入理解相关技术。通过从零开始构建基于 LLM 和 Agent 的应用,学习LLM原理和Agent开发经验。☆24Mar 28, 2025Updated 11 months ago
- 基于RAG的知识问答系统,主要结合了 LLM、Langchain、提示工程、优化知识库结构和检索生成流程、vllm 推理优化框架等技术☆23Mar 12, 2025Updated 11 months ago
- 华南理工大学软件学院历年考试资料☆17Dec 6, 2021Updated 4 years ago
- 基于大语言模型的RAG项目,分别实现了基于文本和知识图谱的RAG☆27Dec 11, 2025Updated 2 months ago
- Make one prompt become an immersive, production‑ready experience: a single pipeline for Text → Image → Music → Lights → Video, with real …☆57Sep 5, 2025Updated 6 months ago
- LangGraph agent template with MCP.☆30Apr 8, 2025Updated 11 months ago
- Multi-Modal-AI-Orchestrator (Reset version),AI Full-modal Full-agent:Text → Image → Music → Lights → Video, Includes "Scenario Director,…☆87Nov 5, 2025Updated 4 months ago
- 🍽️基于图RAG技术的AI美食推荐助手 - Datawhale all-in-rag教程实战案例,集成Neo4j图数据库、Milvus向量检索与智能对话系统☆115Feb 6, 2026Updated last month
- 专业的 LaTeX 简历模板,专为大模型与 Agent 算法工程师设计 | Professional LaTeX resume template for LLM & Agent algorithm engineers☆117Dec 16, 2025Updated 2 months ago
- 面向新同学进组的学习指南☆133Updated this week
- ☆52Dec 31, 2024Updated last year
- Paper阅读记录博客(基于GitHub Action和GitHub Issue实现)。☆58Sep 19, 2025Updated 5 months ago
- 华师毕业论文模板, 华师本科毕业论文模板, 华师论文模板, latex 模板, 毕业论文模板, SCNU, SCNU 论文模板, SCNU 本科论文模板☆52Jun 4, 2021Updated 4 years ago
- Improved Precision and Recall Metric for Assessing Generative Models - Unofficial Pytorch Implementation☆56Jan 24, 2022Updated 4 years ago
- 天池大赛——新闻推荐场景下的用户行为预测挑战赛,SOLO赛,B榜排名5/5338☆71Mar 16, 2021Updated 4 years ago
- 这是我的学习过程中自己整理的资料,实验报告等。电子科技大学 大数据 作业答案 实验报告☆69Nov 12, 2025Updated 3 months ago
- Generate Multi Planar Reconstruction from CT scan data using VTK-PYTHON.☆65Dec 16, 2014Updated 11 years ago
- 同济大学计算机科学技术系参与夏令营、预推免、申请季的历史资料仓库。欢迎各位同学参与贡献~☆78Jun 2, 2023Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆86Dec 14, 2023Updated 2 years ago
- This repository is an official Tensorflow 2 implementation of Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint…☆82Apr 25, 2022Updated 3 years ago
- AI项目(强化学习、深度学习、计算机视觉、推荐系统、自然语言处理、机器导航、医学影像处理)☆92Aug 8, 2023Updated 2 years ago
- 发布23年计算机保研夏令营和预推免通知,往年的保研经验帖;需要带保研或计算机保研资料联系qq:1585601434☆289Mar 11, 2024Updated last year
- SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection☆95Feb 18, 2022Updated 4 years ago
- CVPR2023 - Rethinking Federated Learning with Domain Shift: A Prototype View☆115Dec 29, 2024Updated last year
- ☆84Mar 15, 2023Updated 2 years ago
- 搜广推学习笔记:王树森“推荐系统”课程☆199Nov 30, 2024Updated last year
- ☆150Feb 9, 2026Updated 3 weeks ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆131Nov 11, 2024Updated last year
- 基于MATLAB的车牌识别系统☆166Jul 5, 2013Updated 12 years ago