haohaoXhang / RLHF_learnView external linksLinks
这是一个从零开始构建的强化学习人类反馈(RLHF)学习代码库,实现了 PPO、GRPO、GSPO 以及相关的策略优化算法,并提供了清晰、可复现的训练流程。由于文档是由latex文件转译过来,如果md文件渲染异常,请用VScode的md插件打开
☆76Dec 19, 2025Updated last month
Alternatives and similar repositories for RLHF_learn
Users that are interested in RLHF_learn are comparing it to the libraries listed below
Sorting:
- ☆10Sep 30, 2024Updated last year
- ☆13Jul 19, 2022Updated 3 years ago
- Implement llm model in pytorch, support MoE and RoPE☆39Jan 29, 2026Updated 2 weeks ago
- ☆16Feb 23, 2025Updated 11 months ago
- A lab to practice RAG techniques.☆34Sep 7, 2025Updated 5 months ago
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆25Dec 12, 2023Updated 2 years ago
- 华南理工大学软件学院历年考试资料☆17Dec 6, 2021Updated 4 years ago
- 基于大语言模型的RAG项目,分别实现了基于文本和知识图谱的RAG☆27Dec 11, 2025Updated 2 months ago
- Make one prompt become an immersive, production‑ready experience: a single pipeline for Text → Image → Music → Lights → Video, with real …☆56Sep 5, 2025Updated 5 months ago
- 基于 LLM Lora 微调的金融问答系统,主要结合了 PDF 解析、LLM 微调、vllm 推理优化框架等技术☆50Mar 11, 2025Updated 11 months ago
- Multi-Modal-AI-Orchestrator (Reset version),AI Full-modal Full-agent:Text → Image → Music → Lights → Video, Includes "Scenario Director,…☆80Nov 5, 2025Updated 3 months ago
- A repository to keep track of literature on catastrophic forgetting☆37Mar 10, 2020Updated 5 years ago
- ☆52Dec 31, 2024Updated last year
- 同济大学计算机科学技术系参与夏令营、预推免、申请季的历史资料仓库。欢迎各位同学参与贡献~☆77Jun 2, 2023Updated 2 years ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆86Dec 14, 2023Updated 2 years ago
- 发布23年计算机保研夏令营和预推免通知,往年的保研经验帖;需要带保研或计算机保研资料联系qq:1585601434☆289Mar 11, 2024Updated last year
- CVPR2023 - Rethinking Federated Learning with Domain Shift: A Prototype View☆115Dec 29, 2024Updated last year
- ☆84Mar 15, 2023Updated 2 years ago
- Pre-Trained Language Models for Interactive Decision-Making [NeurIPS 2022]☆130Jun 8, 2022Updated 3 years ago
- 发布23年计算机保研夏令营和预推免通知,往年的保研经验帖;需要带保研或计算机保研资料联系qq:1585601434☆164Oct 2, 2024Updated last year
- 一个简单的多模态RAG项目☆299May 13, 2025Updated 9 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆220Mar 31, 2025Updated 10 months ago
- ☆183Mar 14, 2023Updated 2 years ago
- Test-time Prompt Tuning (TPT) for zero-shot generalization in vision-language models (NeurIPS 2022))☆207Oct 21, 2022Updated 3 years ago
- 本项目设计了一个基于 RAG 与大模型技术的医疗问答系统,利用 DiseaseKG 数据集与 Neo4j 构 建知识图谱,结合 BERT 的命名实体识别和 34b 大模型的意图识别,通过精确的知识检索和问答生成, 提升系统在医疗咨询中的性能,解决大模型在医疗领域应用的可…☆1,067May 21, 2024Updated last year
- [ICLR'25] MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models☆301Jan 22, 2025Updated last year
- This is the official repository for Retrieval Augmented Visual Question Answering☆243Dec 19, 2024Updated last year
- 主流推荐系统Rank算法的实现☆282Oct 25, 2023Updated 2 years ago
- 每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈☆5,501Feb 5, 2026Updated last week
- 拼好RAG:手搓并融合了GraphRAG、LightRAG、Neo4j-llm-graph-builder进行知识图谱构建以及 搜索;整合DeepSearch技术实现私域RAG的推理;自制针对GraphRAG的评估框架| Integrate GraphRAG, LightRA…☆1,869Nov 5, 2025Updated 3 months ago
- Multi-Agent-GPT: 一款基于RAG和agent构建的多模态专家助手GPT。它集成了文本、图像和音频等模态工具。支持本地部署和私有数据库建设。☆256Feb 22, 2025Updated 11 months ago
- ☆383Apr 29, 2025Updated 9 months ago
- [CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space☆265Apr 1, 2021Updated 4 years ago
- AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目☆2,754Oct 30, 2025Updated 3 months ago
- 2024年保研经验贴和相关物料☆1,079Jul 7, 2024Updated last year
- RAGOnMedicalKG,将大模型RAG与KG结合,完成demo级问答,旨在给出基础的思路。☆339Mar 31, 2024Updated last year
- 一个华科计算机学院的资料集合 https://yuhangchen1.github.io/HUST_OPEN_SOURCE/☆554Jan 19, 2026Updated 3 weeks ago
- Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"☆345Jan 15, 2022Updated 4 years ago
- 知识图谱可视化展示☆349Apr 19, 2022Updated 3 years ago