wlll123456/study_rlhf

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wlll123456/study_rlhf)

wlll123456 / study_rlhf

☆108

Alternatives and similar repositories for study_rlhf

Users that are interested in study_rlhf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuandaxia2001 / HealthAI-2025
View on GitHub
☆169Mar 18, 2026Updated 4 months ago
Xinyi-0724 / Search-R1-Qwen3
View on GitHub
Enhanced Search-R1 Implementation: Improved Compatibility and Modern Framework Integration
☆28Dec 8, 2025Updated 7 months ago
wyf3 / llm_related
View on GitHub
复现大模型相关算法及一些学习记录
☆3,463Jul 2, 2026Updated 2 weeks ago
shibing624 / WebResearcher
View on GitHub
WebResearcher: An Iterative Deep-Research Agent，迭代式深度研究智能体
☆49Feb 13, 2026Updated 5 months ago
Vacancy1016 / Multi-Agent-project
View on GitHub
面向个人投资者金融投资研究与辅助决策agent系统
☆28Jul 9, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
PeterGriffinJin / Search-R1
View on GitHub
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
☆5,123Nov 13, 2025Updated 8 months ago
bcefghj / learn-MedicalGPT
View on GitHub
🏥 从零基础到面试通关：20节课彻底搞懂MedicalGPT医疗大模型训练全流程 | PT/SFT/LoRA/RLHF/DPO/GRPO | 100+面试高频考点
☆185Apr 1, 2026Updated 3 months ago
littlehh-xf / TXGR-GAME
View on GitHub
2025腾讯生成式推荐广告算法大赛(深海大菠萝)
☆17Aug 3, 2025Updated 11 months ago
wicided / MedicalGPT
View on GitHub
个人学习的医疗大模型微调项目
☆37Dec 3, 2025Updated 7 months ago
Sherlock1956 / ModelAlignmentFromScratch
View on GitHub
☆45Nov 22, 2025Updated 7 months ago
nambo / menu-rag
View on GitHub
Beyond Basic RAG, Empowering Real-Time Deep Research
☆20Sep 12, 2025Updated 10 months ago
John-Chen92 / tianchi-news-recommendation
View on GitHub
零基础入门推荐系统 - 新闻推荐 Top2
☆44Mar 19, 2025Updated last year
mbzuai-oryx / MediX-R1
View on GitHub
Open Ended Medical Reinforcement Learning
☆63Mar 15, 2026Updated 4 months ago
wdndev / tiny-mcp
View on GitHub
Python 实现 MCP client / service
☆81May 7, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
bbruceyuan / AI-Interview-Code
View on GitHub
LLM大模型（重点）以及搜广推等 AI 算法中手写的面试题，（非 LeetCode），比如 Self-Attention, AUC等，一般比 LeetCode 更考察一个人的综合能力，又更贴近业务和基础知识一点
☆598May 4, 2026Updated 2 months ago
AkaliKong / MiniOneRec
View on GitHub
Minimal reproduction of OneRec
☆1,709May 14, 2026Updated 2 months ago
Wood-Q / MokioMind
View on GitHub
三元三小时手敲大模型
☆539Mar 12, 2026Updated 4 months ago
thinkwee / AgentsMeetRL
View on GitHub
Awesome List for Agentic RL
☆1,701Jun 20, 2026Updated last month
ASTRAL-Group / LoRe
View on GitHub
When Reasoning Meets Its Laws
☆38Jan 2, 2026Updated 6 months ago
amine-akrout / customer-support-agentic-rag
View on GitHub
An intelligent customer support system powered by LangGraph and LangChain that uses Retrieval-Augmented Generation (RAG) to provide accur…
☆20Jul 25, 2025Updated 11 months ago
TeenLucifer / vlm_reproduce
View on GitHub
☆40Nov 16, 2025Updated 8 months ago
johnson7788 / EasyTrainAgent
View on GitHub
超简单使用监督微调SFT和强化学习RL去训练领域Agent
☆35Oct 20, 2025Updated 9 months ago
zxuu / RLHF
View on GitHub
LLM中相关RLHF算法实现与学习
☆15Apr 13, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
KMnO4-zx / llm-agent-rl-lab
View on GitHub
Reproducing and studying RL algorithms for LLM agents, including PPO, GRPO, GSPO, DAPO, OPD and beyond.
☆33Updated this week
yizhu-joy / DataFilter
View on GitHub
☆15Nov 29, 2025Updated 7 months ago
ZehyrW / WHU_OS_2021
View on GitHub
武汉大学国家网络安全学院2021级操作系统期末大实验
☆12Jan 2, 2024Updated 2 years ago
xkx-youcha / GR-movie-recommendation
View on GitHub
基于movielens-25m数据集的生成式推荐项目
☆53Aug 6, 2025Updated 11 months ago
Jack-ctrl6 / GPT_KVcache_GQA
View on GitHub
☆32Sep 4, 2025Updated 10 months ago
GNEHUY / Awesome-AgenticRAG_DeepResearch
View on GitHub
🎯Awesome-AgenticRAG_DeepResearch: A curated list of resources on Agentic RAG & DeepResearch. 学习参考关于AgenticRAG、DeepResearch的发展相关论文
☆32Jun 12, 2026Updated last month
K1XE / InterviewForge
View on GitHub
Local-first interview recording review reports with a Codex skill and CLI.
☆75May 16, 2026Updated 2 months ago
li-xiu-qi / spark_multi_rag
View on GitHub
科大讯飞多模态RAG图文问答挑战赛
☆74Aug 4, 2025Updated 11 months ago
qiufengqijun / open-r1-reprod
View on GitHub
这是一个open-r1的复现项目，对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练，观察到一些有趣的现象。
☆64Apr 13, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yanshao9798 / nlp_slides
View on GitHub
杭高院自然语言处理课程2023
☆26Nov 22, 2023Updated 2 years ago
titizheng / M3amba
View on GitHub
Implementation of "M3amba: Memory Mamba is All You Need for Whole Slide Image Classification". CVPR2025
☆12Feb 27, 2025Updated last year
owenliang / hf-ppo
View on GitHub
Huggingface PPO Demo
☆29Sep 7, 2025Updated 10 months ago
shenduldh / CosyVoice-Lightning
View on GitHub
Lightning-responsive CosyVoice streaming API based on FastAPI.
☆28Apr 27, 2026Updated 2 months ago
datawhalechina / tiny-universe
View on GitHub
《大模型白盒子构建指南》：一个全手搓的Tiny-Universe
☆4,968Feb 12, 2026Updated 5 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,571Updated this week
wdndev / llm_interview_note
View on GitHub
主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题
☆14,733Jun 14, 2026Updated last month