The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
☆275Mar 10, 2025Updated 11 months ago
Alternatives and similar repositories for Open-R1
Users that are interested in Open-R1 are comparing it to the libraries listed below
Sorting:
- Implementation of Chinese ChatGPT☆288Nov 20, 2023Updated 2 years ago
- Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.☆619Jan 24, 2025Updated last year
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆79Feb 19, 2025Updated last year
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆256Aug 1, 2023Updated 2 years ago
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆223Dec 8, 2023Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆237Aug 17, 2025Updated 6 months ago
- DeepSeek 系列工作解读、扩展和复现。☆699Mar 29, 2025Updated 11 months ago
- 大语言模型训练和服务调研☆37Aug 4, 2023Updated 2 years ago
- Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调☆3,731Oct 12, 2023Updated 2 years ago
- Codebase for the paper "Schema-guided User Satisfaction Modeling for Task-oriented Dialogues"☆11Aug 6, 2025Updated 6 months ago
- GTS Engine: A powerful NLU Training System。GTS引擎(GTS-Engine)是一款开箱即用且性能强大的自然语言理解引擎,聚焦于小样本任务,能够仅用小样本就能自动化生产NLP模型。☆93Feb 28, 2023Updated 3 years ago
- The persona-generator is a library designed to transform user input (natural language processing) into structured JSON files representing…☆12May 5, 2025Updated 9 months ago
- Helmet Detector based on the CenterNet.☆11Jan 30, 2022Updated 4 years ago
- LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration☆11Mar 11, 2024Updated last year
- Label Studio is a multi-type data labeling and annotation tool with standardized output format☆10Nov 17, 2021Updated 4 years ago
- DeepSeek-R1本地化WebUI运行☆43Feb 25, 2025Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,281Oct 16, 2024Updated last year
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆41Apr 4, 2025Updated 10 months ago
- Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback☆1,585Nov 24, 2025Updated 3 months ago
- 基于BERT-CRF的命名实体识别模型☆13Mar 14, 2022Updated 3 years ago
- Colab notebooks from Launch Data Science at HackCville☆13Jun 14, 2019Updated 6 years ago
- 中文原生等级化代码能力测试基准☆15Apr 11, 2024Updated last year
- use wechat to connect openclaw☆31Updated this week
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆140Apr 28, 2023Updated 2 years ago
- 一个基于预训练的句向量生成工具☆138Mar 30, 2023Updated 2 years ago
- pytorch实现的Pyramidbox 人脸检测模型, 对原来代码的部分模块进行了修改,更简洁高效☆22Dec 8, 2020Updated 5 years ago
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆198May 23, 2023Updated 2 years ago
- 2016华为codecraft算法大赛 (dfs+pruning)☆12Mar 6, 2017Updated 8 years ago
- LLMTechSite, 专注于通用人工智能领域的技术生态。☆12Jan 23, 2026Updated last month
- LLM手撕代码合集☆19Mar 25, 2025Updated 11 months ago
- A derivation of the Sequential Minimal Optimization Algorithm for Support Vector Machines☆11Feb 13, 2024Updated 2 years ago
- Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-…☆564May 9, 2024Updated last year
- Secrets of RLHF in Large Language Models Part I: PPO☆1,416Mar 3, 2024Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- ☆59Jul 21, 2025Updated 7 months ago
- deepspeed+trainer简单高效实现多卡微调大模型☆132May 27, 2023Updated 2 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆52May 17, 2023Updated 2 years ago
- 聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)☆657Jun 30, 2023Updated 2 years ago
- 🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答☆337Sep 2, 2023Updated 2 years ago