这是一个从零开始构建的强化学习人类反馈(RLHF)学习代码库,实现了 PPO、GRPO、GSPO 以及相关的策略优化算法,并提供了清晰、可复现的训练流程。由于文档是由latex文件转译过来,如果md文件渲染异常,请用VScode的md插件打开
☆87Dec 19, 2025Updated 4 months ago
Alternatives and similar repositories for RLHF_learn
Users that are interested in RLHF_learn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for the paper "Understanding Co-speech Gestures in-the-wild"☆24Oct 31, 2025Updated 5 months ago
- Implementation of my agent used in 2025 AFAC TianChi competition☆28Oct 6, 2025Updated 6 months ago
- Make one prompt become an immersive, production‑ready experience: a single pipeline for Text → Image → Music → Lights → Video, with real …☆68Sep 5, 2025Updated 7 months ago
- "简历匹配智能体"是一个AI驱动的平台,旨在逆向工程招聘算法,向您展示如何精准定制简历。获取那些能让你通过初步筛选、进入人工审阅阶段的关键词、格式和洞察。resume-matcher-agent意思就是模拟HR怎样筛选你的简历,提前给你展示筛选的结论,方便你尽快修改好你的简…☆59Sep 9, 2025Updated 7 months ago
- 南京大学计算机系编译原理课程实验☆22Jun 12, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Implement llm model in pytorch, support MoE and RoPE☆65Updated this week
- 本仓库含《Unix/Linux编程实践教程》(Understanding Unix/Linux Programming: A Guide to Theory and Practice)全书各章节示例源码☆12Oct 16, 2021Updated 4 years ago
- 本仓库旨在记录和分享我在 LLM 和 Agent 领域的学习历程,并通过实践项目深入理解相关技术。通过从零开始构建基于 LLM 和 Agent 的应用,学习LLM原理和Agent开发经验。☆25Mar 28, 2025Updated last year
- 华南理工大学软件学院历年考试资料☆16Dec 6, 2021Updated 4 years ago
- LangGraph agent template with MCP.☆32Apr 8, 2025Updated last year
- A lab to practice RAG techniques.☆42Sep 7, 2025Updated 7 months ago
- this is dataset about network traffic☆21Mar 5, 2021Updated 5 years ago
- A theoretical and practical deep dive into Reinforcement Learning with Human Feedback and it’s applications in Large Language Models from…☆112Nov 7, 2025Updated 5 months ago
- 你的下一个SCNU本科论文LaTeX模板☆38Apr 19, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Multi-Modal-AI-Orchestrator (Reset version),AI Full-modal Full-agent:Text → Image → Music → Lights → Video, Includes "Scenario Director,…☆97Nov 5, 2025Updated 5 months ago
- NTU EEE postgraduate 项目笔记作业习题答案分享☆111May 10, 2025Updated 11 months ago
- 华师毕业论文模板, 华师本科毕业论文模板, 华师论文模板, latex 模板, 毕业论文模板, SCNU, SCNU 论文模板, SCNU 本科论文模板☆52Jun 4, 2021Updated 4 years ago
- 专业的 LaTeX 简历模板,专为大模型与 Agent 算法工程师设计 | Professional LaTeX resume template for LLM & Agent algorithm engineers☆199Dec 16, 2025Updated 4 months ago
- 面向新同学进组的学习指南☆157Apr 2, 2026Updated 2 weeks ago
- Paper阅读记录博客(基于GitHub Action和GitHub Issue实现)。☆59Sep 19, 2025Updated 7 months ago
- 🍽️基于图RAG技术的AI美食推荐助手 - Datawhale all-in-rag教程实战案例,集成Neo4j图数据库、Milvus向量检索与智能对话系统☆156Feb 6, 2026Updated 2 months ago
- 基于RAG技术的超级简单易懂的知识库实现☆74Oct 13, 2024Updated last year
- Generate Multi Planar Reconstruction from CT scan data using VTK-PYTHON.☆65Dec 16, 2014Updated 11 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 发布23年计算机保研夏令营和预推免通知,往年的保研经验帖;需要带保研或计算机保研资料联系qq:1585601434☆287Mar 11, 2024Updated 2 years ago
- Local RAG researcher agent built using Langgraph, DeepSeek R1 and Ollama☆139Feb 13, 2025Updated last year
- This repository is an official Tensorflow 2 implementation of Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint…☆81Apr 25, 2022Updated 3 years ago
- 天池大赛——新闻推荐场景下的用户行为预测挑战赛,SOLO赛,B榜排名5/5338☆75Mar 16, 2021Updated 5 years ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆131Nov 11, 2024Updated last year
- 集成学习Stacking方法详解☆83Sep 20, 2019Updated 6 years ago
- [CVPR2025] Official Implementation of ILLUME+☆126Aug 20, 2025Updated 7 months ago
- AI项目(强化学习、深度学习、计算机视觉、推荐系统、自然语言处理、机器导航、医学影像处理)☆92Aug 8, 2023Updated 2 years ago
- ☆84Mar 15, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CVPR2023 - Rethinking Federated Learning with Domain Shift: A Prototype View☆116Dec 29, 2024Updated last year
- some scripts of aseprite e.g. psd exporter☆129Mar 6, 2026Updated last month
- How to build a Multi-Agentic Systems for RAG using LangGraph - Full project☆214Jan 10, 2025Updated last year
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆226Mar 31, 2025Updated last year
- [ICLR2023] Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation (CDCD).☆162Apr 5, 2023Updated 3 years ago
- ☆155Mar 20, 2026Updated 3 weeks ago
- 搜广推学习笔记:王树森“推荐系统”课程☆212Nov 30, 2024Updated last year