jackaduma / Vicuna-LoRA-RLHF-PyTorchView external linksLinks
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
☆221May 20, 2024Updated last year
Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch
Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆61Apr 28, 2023Updated 2 years ago
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆140Apr 28, 2023Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆237Aug 17, 2025Updated 5 months ago
- llama fine-tuning with lora☆140May 8, 2024Updated last year
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated 3 weeks ago
- moss chat finetuning☆51Apr 23, 2024Updated last year
- 对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF☆198May 23, 2023Updated 2 years ago
- We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…☆2,799Dec 12, 2023Updated 2 years ago
- Instruct-tune LLaMA on consumer hardware☆18,978Jul 29, 2024Updated last year
- Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback☆1,582Nov 24, 2025Updated 2 months ago
- 2019~2021年间Zero-shot/Data-free知识蒸馏的论文合集☆11Sep 8, 2021Updated 4 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- InternLM-7B微调, SFT/LoRA, instruction finetune☆13May 17, 2024Updated last year
- ☆11Oct 19, 2020Updated 5 years ago
- A crowd-powered database system, with SQL-like query interface, multi-goal optimization☆11Sep 4, 2017Updated 8 years ago
- 4 bits quantization of LLaMA using GPTQ☆3,074Jul 13, 2024Updated last year
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆27Jul 1, 2024Updated last year
- Instruction Tuning with GPT-4☆4,340Jun 11, 2023Updated 2 years ago
- Multimodal RAG using LlamaIndex, Qdrant, llama.cpp for document QA with local VisonLLM and embedding models☆17Nov 8, 2024Updated last year
- ☆20Apr 20, 2023Updated 2 years ago
- ☆12Aug 15, 2022Updated 3 years ago
- ☆13Nov 19, 2022Updated 3 years ago
- Easy implementations of GCN on Elliptic Datasets☆13Dec 19, 2020Updated 5 years ago
- Text classification with Foundation Language Model LLaMA☆113Mar 19, 2023Updated 2 years ago
- Example models using DeepSpeed☆6,785Feb 7, 2026Updated last week
- ☆25Nov 14, 2022Updated 3 years ago
- deepspeed+trainer简单高效实现多卡微调大模型☆133May 27, 2023Updated 2 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,742Jan 8, 2024Updated 2 years ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Sep 15, 2023Updated 2 years ago
- using the Elliptic Data Set (https://www.kaggle.com/ellipticco/elliptic-data-set) and working to improve on the orignals results by Weber…☆15Jul 13, 2024Updated last year
- 本项目采用BERT等预训练模型实现多项选择型阅读理解任务(Multiple Choice MRC)☆16Jun 20, 2021Updated 4 years ago
- ☆20Dec 8, 2025Updated 2 months ago
- Reinforcement Learning for Uplift Modeling☆13Mar 13, 2021Updated 4 years ago
- chatglm多gpu用deepspeed和☆409Jul 8, 2024Updated last year
- Let ChatGPT teach your own chatbot in hours with a single GPU!☆3,167Mar 17, 2024Updated last year
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预 测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆223Dec 8, 2023Updated 2 years ago
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,289Dec 9, 2025Updated 2 months ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆90Nov 23, 2022Updated 3 years ago
- chatglm_rlhf_finetuning☆30Oct 10, 2023Updated 2 years ago