jackaduma/Vicuna-LoRA-RLHF-PyTorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jackaduma/Vicuna-LoRA-RLHF-PyTorch)

jackaduma / Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna

☆220

Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch

Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jackaduma / Alpaca-LoRA-RLHF-PyTorch
View on GitHub
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…
☆60Apr 28, 2023Updated 3 years ago
jackaduma / ChatGLM-LoRA-RLHF-PyTorch
View on GitHub
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…
☆138Apr 28, 2023Updated 3 years ago
jasonvanf / llama-trl
View on GitHub
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
☆240Aug 17, 2025Updated 11 months ago
l294265421 / alpaca-rlhf
View on GitHub
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆118Jun 5, 2023Updated 3 years ago
git-cloner / llama-lora-fine-tuning
View on GitHub
llama fine-tuning with lora
☆140May 8, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Facico / Chinese-Vicuna
View on GitHub
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca
☆4,119Apr 18, 2025Updated last year
Miraclemarvel55 / ChatGLM-RLHF
View on GitHub
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
☆196May 23, 2023Updated 3 years ago
tloen / alpaca-lora
View on GitHub
Instruct-tune LLaMA on consumer hardware
☆18,909Jul 29, 2024Updated last year
PhoebusSi / Alpaca-CoT
View on GitHub
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…
☆2,791Dec 12, 2023Updated 2 years ago
jackaduma / awesome_NLP-Interview-Notes
View on GitHub
nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案
☆24Nov 16, 2023Updated 2 years ago
oppsitre / RLift
View on GitHub
Reinforcement Learning for Uplift Modeling
☆13Mar 13, 2021Updated 5 years ago
qwopqwop200 / GPTQ-for-LLaMa
View on GitHub
4 bits quantization of LLaMA using GPTQ
☆3,071Jul 13, 2024Updated 2 years ago
ssbuild / moss_finetuning
View on GitHub
moss chat finetuning
☆51Apr 23, 2024Updated 2 years ago
thomfoster / minRLHF
View on GitHub
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆91Nov 23, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
git-cloner / llama2-lora-fine-tuning
View on GitHub
llama2 finetuning with deepspeed and lora
☆176Jul 28, 2023Updated 2 years ago
scottlogic-alex / prm800k-denorm
View on GitHub
Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format
☆27Jul 12, 2023Updated 3 years ago
johnrobinsn / redpajama
View on GitHub
Training and Inference Notebooks for the RedPajama (OpenLlama) models
☆19May 18, 2023Updated 3 years ago
deepspeedai / DeepSpeedExamples
View on GitHub
Example models using DeepSpeed
☆6,831Updated this week
Miraclemarvel55 / LLaMA-MOSS-RLHF-LoRA
View on GitHub
用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]
☆21May 16, 2023Updated 3 years ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,611Nov 24, 2025Updated 7 months ago
zetavg / LLaMA-LoRA-Tuner
View on GitHub
UI tool for fine-tuning and testing your own LoRA models base on LLaMA, GPT-J and more. One-click run on Google Colab. + A Gradio ChatGPT…
☆474May 29, 2023Updated 3 years ago
ssbuild / chatglm_rlhf
View on GitHub
chatglm_rlhf_finetuning
☆30Oct 10, 2023Updated 2 years ago
Instruction-Tuning-with-GPT-4 / GPT-4-LLM
View on GitHub
Instruction Tuning with GPT-4
☆4,332Jun 11, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DennisLiu2022 / Membership-Inference-Attacks-by-Exploiting-Loss-Trajectory
View on GitHub
☆25Nov 14, 2022Updated 3 years ago
VITA-Group / DP-OPT
View on GitHub
[ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
☆48May 30, 2024Updated 2 years ago
tsosea2 / eMLM
View on GitHub
This is the code for our ACL 2021 paper entitled eMLM: A New Pre-training Objective for Emotion Related Tasks
☆15Sep 7, 2022Updated 3 years ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
stanleylsx / llms_tool
View on GitHub
一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测，低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。
☆226Dec 8, 2023Updated 2 years ago
project-baize / baize-chatbot
View on GitHub
Let ChatGPT teach your own chatbot in hours with a single GPU!
☆3,151Mar 17, 2024Updated 2 years ago
percent4 / keras_bert_multiple_choice_MRC
View on GitHub
本项目采用BERT等预训练模型实现多项选择型阅读理解任务（Multiple Choice MRC）
☆16Jun 20, 2021Updated 5 years ago
BillSchumacher / Auto-GPT-Vicuna
View on GitHub
☆19Apr 20, 2023Updated 3 years ago
LC1332 / Chinese-alpaca-lora
View on GitHub
骆驼:A Chinese finetuned instruction LLaMA. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
☆717May 30, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
opendilab / awesome-RLHF
View on GitHub
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,416May 20, 2026Updated 2 months ago
UKPLab / EACL21-personalized-conversational-system
View on GitHub
☆12Nov 19, 2022Updated 3 years ago
LC1332 / Luotuo-Chinese-LLM
View on GitHub
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
☆3,590Sep 3, 2023Updated 2 years ago
tomekkorbak / pretraining-with-human-feedback
View on GitHub
Code accompanying the paper Pretraining Language Models with Human Preferences
☆182Feb 13, 2024Updated 2 years ago
providence-replay / providence
View on GitHub
An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…
☆11Jan 23, 2026Updated 5 months ago
liangwq / Chatglm_lora_multi-gpu
View on GitHub
chatglm多gpu用deepspeed和
☆409Jul 8, 2024Updated 2 years ago
27182812 / ChatGLM-LLaMA-chinese-insturct
View on GitHub
探索中文instruct数据在ChatGLM, LLaMA上的微调表现
☆387Apr 4, 2023Updated 3 years ago