A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
☆220May 20, 2024Updated 2 years ago
Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch
Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆60Apr 28, 2023Updated 3 years ago
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆138Apr 28, 2023Updated 3 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 10 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 3 years ago
- llama fine-tuning with lora☆140May 8, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca☆4,121Apr 18, 2025Updated last year
- AI driven Web Application Firewall☆32Dec 12, 2022Updated 3 years ago
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆197May 23, 2023Updated 3 years ago
- Instruct-tune LLaMA on consumer hardware☆18,913Jul 29, 2024Updated last year
- We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…☆2,796Dec 12, 2023Updated 2 years ago
- moss chat finetuning☆51Apr 23, 2024Updated 2 years ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- 4 bits quantization of LLaMA using GPTQ☆3,073Jul 13, 2024Updated last year
- Reinforcement Learning for Uplift Modeling☆13Mar 13, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆91Nov 23, 2022Updated 3 years ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆27Jul 1, 2024Updated last year
- llama2 finetuning with deepspeed and lora☆176Jul 28, 2023Updated 2 years ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]☆21May 16, 2023Updated 3 years ago
- Example models using DeepSpeed☆6,822May 20, 2026Updated 3 weeks ago
- Training and Inference Notebooks for the RedPajama (OpenLlama) models☆19May 18, 2023Updated 3 years ago
- An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.☆39,476May 1, 2026Updated last month
- Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback☆1,605Nov 24, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning☆16Sep 20, 2025Updated 9 months ago
- chatglm_rlhf_finetuning☆30Oct 10, 2023Updated 2 years ago
- Instruction Tuning with GPT-4☆4,335Jun 11, 2023Updated 3 years ago
- This is the code for our ACL 2021 paper entitled eMLM: A New Pre-training Objective for Emotion Related Tasks☆15Sep 7, 2022Updated 3 years ago
- ☆25Nov 14, 2022Updated 3 years ago
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆225Dec 8, 2023Updated 2 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,749Jan 8, 2024Updated 2 years ago
- Let ChatGPT teach your own chatbot in hours with a single GPU!☆3,156Mar 17, 2024Updated 2 years ago
- 本项目采用BERT等预训练模型实现多项选择型阅读理解任务(Multiple Choice MRC)☆16Jun 20, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 骆驼:A Chinese finetuned instruction LLaMA. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技☆718May 30, 2023Updated 3 years ago
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,393May 20, 2026Updated 3 weeks ago
- ☆27Dec 8, 2025Updated 6 months ago
- [ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer☆47May 30, 2024Updated 2 years ago
- SafeArena is a benchmark for assessing the harmful capabilities of web agents☆23Apr 23, 2025Updated last year
- ☆12Nov 19, 2022Updated 3 years ago
- 骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技☆3,599Sep 3, 2023Updated 2 years ago