A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
☆140Apr 28, 2023Updated 2 years ago
Alternatives and similar repositories for ChatGLM-LoRA-RLHF-PyTorch
Users that are interested in ChatGLM-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆198May 23, 2023Updated 2 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆61Apr 28, 2023Updated 2 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- ☆43Dec 15, 2023Updated 2 years ago
- moss chat finetuning☆51Apr 23, 2024Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 7 months ago
- Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调☆3,730Oct 12, 2023Updated 2 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- chatglm多gpu用deepspeed和☆408Jul 8, 2024Updated last year
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆44Jul 8, 2025Updated 8 months ago
- chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu☆165Aug 24, 2023Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- Humanable Chat Generative-model Fine-tuning | LLM微调☆207Sep 22, 2023Updated 2 years ago
- ChatGLM2-6B 全参数微调,支持多轮对话的高效微调。☆402Aug 17, 2023Updated 2 years ago
- ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SF…☆2,413Sep 29, 2023Updated 2 years ago
- PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation☆16Mar 28, 2023Updated 2 years ago
- 基于ChatGLM-6B + LoRA的Fintune方案☆3,758Nov 25, 2023Updated 2 years ago
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 2 years ago
- Training a reward model for RLHF using RWKV.☆15Jun 5, 2023Updated 2 years ago
- chatglm 6b finetuning and alpaca finetuning☆1,537Mar 9, 2025Updated last year
- Llama 3 ORPO Fine Tuning on A100 in Colab Pro.☆12Apr 21, 2024Updated last year
- 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]☆21May 16, 2023Updated 2 years ago
- 本项目用于存放论文:基于远程监督的人物属性抽取研究 的实验数据☆13Oct 30, 2019Updated 6 years ago
- The source code used for paper "TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision…☆25Apr 6, 2025Updated 11 months ago
- self-host ChatGLM-6B API made with fastapi☆79Mar 24, 2023Updated 3 years ago
- This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesis☆11Feb 17, 2023Updated 3 years ago
- Instruction Tuning with GPT-4☆4,337Jun 11, 2023Updated 2 years ago
- 基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等☆2,781Dec 12, 2023Updated 2 years ago
- Implementation of Chinese ChatGPT☆289Nov 20, 2023Updated 2 years ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆110Jul 19, 2023Updated 2 years ago
- ChatGLM-6B 指令学习|指令数据|Instruct☆653Apr 10, 2023Updated 2 years ago
- 实现Blip2RWKV+QFormer的多模态图文对话大模型,使用Two-Step Cognitive Psychology Prompt方法,仅3B参数的模型便能够出现类人因果思维链。对标MiniGPT-4,ImageBind等图文对话大语言模型,力求以更小的算力和资源实…☆42Jul 17, 2023Updated 2 years ago
- Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI m…☆227Jul 24, 2023Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,742Jan 8, 2024Updated 2 years ago
- MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series☆17Sep 5, 2025Updated 6 months ago
- ☆16Jan 17, 2022Updated 4 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆50Mar 15, 2023Updated 3 years ago
- 基于RWKV模型的角色扮演,实际上是个改的妈都不认识的 RWKV_Role_Playing☆17Aug 17, 2023Updated 2 years ago