A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
☆138Apr 28, 2023Updated 3 years ago
Alternatives and similar repositories for ChatGLM-LoRA-RLHF-PyTorch
Users that are interested in ChatGLM-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- chatglm_rlhf_finetuning☆30Oct 10, 2023Updated 2 years ago
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆197May 23, 2023Updated 3 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆60Apr 28, 2023Updated 3 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated 2 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 微调ChatGLM☆128May 5, 2023Updated 3 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 9 months ago
- AI driven Web Application Firewall☆32Dec 12, 2022Updated 3 years ago
- Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调☆3,721Oct 12, 2023Updated 2 years ago
- ZYN: Zero-Shot Reward Models with Yes-No Questions☆35Aug 15, 2023Updated 2 years ago
- chatglm多gpu用deepspeed和☆409Jul 8, 2024Updated last year
- ChatGLM-6B添加了RLHF的实现,以及部分核心代码的逐行讲解 ,实例部分是做了个新闻短标题的生成,以及指定context推荐的RLHF的实现☆88Aug 16, 2023Updated 2 years ago
- chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu☆165Aug 24, 2023Updated 2 years ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 3 years ago
- Humanable Chat Generative-model Fine-tuning | LLM微调☆206Sep 22, 2023Updated 2 years ago
- ChatGLM2-6B 全参数微调,支持多轮对话的高效微调。☆400Aug 17, 2023Updated 2 years ago
- Secrets of RLHF in Large Language Models Part I: PPO☆1,426Mar 3, 2024Updated 2 years ago
- ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SF…☆2,420Sep 29, 2023Updated 2 years ago
- 基于ChatGLM-6B + LoRA的Fintune方案☆3,746Nov 25, 2023Updated 2 years ago
- PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation☆16Mar 28, 2023Updated 3 years ago
- chatglm 6b finetuning and alpaca finetuning☆1,531Mar 9, 2025Updated last year
- Llama 3 ORPO Fine Tuning on A100 in Colab Pro.☆12Apr 21, 2024Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]☆21May 16, 2023Updated 3 years ago
- The source code used for paper "TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision…☆24Apr 6, 2025Updated last year
- ☆23Jun 23, 2023Updated 2 years ago
- self-host ChatGLM-6B API made with fastapi☆78Mar 24, 2023Updated 3 years ago
- This is the code for our paper: PLACES: Prompting Language Models for Social Conversation Synthesis☆11Feb 17, 2023Updated 3 years ago
- Instruction Tuning with GPT-4☆4,336Jun 11, 2023Updated 3 years ago
- Implementation of Chinese ChatGPT☆287Nov 20, 2023Updated 2 years ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆109Jul 19, 2023Updated 2 years ago
- ChatGLM-6B 指令学习|指令数据|Instruct☆651Apr 10, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- MarketGPT: Developing a Pre-trained transformer (GPT) for Modeling Financial Time Series☆18Sep 5, 2025Updated 9 months ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,750Jan 8, 2024Updated 2 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆49Mar 15, 2023Updated 3 years ago
- 基于RWKV模型的角色扮演,实际上是个改的妈都不认识的 RWKV_Role_Playing☆17Aug 17, 2023Updated 2 years ago
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,386May 20, 2026Updated 3 weeks ago
- deepspeed+trainer简单高效实现多卡微调大模型☆133May 27, 2023Updated 3 years ago