A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
☆140Apr 28, 2023Updated 2 years ago
Alternatives and similar repositories for ChatGLM-LoRA-RLHF-PyTorch
Users that are interested in ChatGLM-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆198May 23, 2023Updated 2 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆61Apr 28, 2023Updated 2 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆237Aug 17, 2025Updated 6 months ago
- moss chat finetuning☆51Apr 23, 2024Updated last year
- 微调ChatGLM☆128May 5, 2023Updated 2 years ago
- MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series☆17Sep 5, 2025Updated 5 months ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调☆3,731Oct 12, 2023Updated 2 years ago
- chatglm多gpu用deepspeed和☆408Jul 8, 2024Updated last year
- chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu☆165Aug 24, 2023Updated 2 years ago
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 2 years ago
- ☆13May 25, 2023Updated 2 years ago
- ICLR 2021: "Monte-Carlo Planning and Learning with Language Action Value Estimates"☆33Nov 30, 2023Updated 2 years ago
- This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesis☆11Feb 17, 2023Updated 3 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- ChatGLM-6B添加了RLHF的实现,以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成,以及指定context推荐的RLHF的实现☆88Aug 16, 2023Updated 2 years ago
- PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation☆16Mar 28, 2023Updated 2 years ago
- GOAT(山羊)是中英文大语言模型,基于LlaMa进行SFT。☆12Apr 24, 2023Updated 2 years ago
- Repository containing the website for the EMNLP 2023 conference☆17Feb 12, 2025Updated last year
- Text Generation Using RNNs☆12Dec 15, 2018Updated 7 years ago
- Secrets of RLHF in Large Language Models Part I: PPO☆1,416Mar 3, 2024Updated 2 years ago
- Implementation of Chinese ChatGPT☆288Nov 20, 2023Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- RWA in pytorch☆14May 7, 2017Updated 8 years ago
- Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"☆16Jul 4, 2023Updated 2 years ago
- 基于RWKV模型的角色扮演,实际上是个改的妈都不认识的 RWKV_Role_Playing☆17Aug 17, 2023Updated 2 years ago
- ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SF…☆2,409Sep 29, 2023Updated 2 years ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆44Jul 8, 2025Updated 7 months ago
- 基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等☆2,777Dec 12, 2023Updated 2 years ago
- ☆18Nov 13, 2021Updated 4 years ago
- Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI m…☆225Jul 24, 2023Updated 2 years ago
- chatglm 6b finetuning and alpaca finetuning☆1,536Mar 9, 2025Updated 11 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆37Jun 10, 2024Updated last year
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆110Jul 19, 2023Updated 2 years ago
- Instruction Tuning with GPT-4☆4,341Jun 11, 2023Updated 2 years ago
- chatglm3-6b, 微调/LORA/推理/单机多卡/deepspeed/支持多轮对话☆17Nov 30, 2023Updated 2 years ago
- SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling☆14Oct 10, 2018Updated 7 years ago
- 基于ChatGLM-6B + LoRA的Fintune方案☆3,759Nov 25, 2023Updated 2 years ago