jackaduma / Vicuna-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
☆219Updated last year
Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch
Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- [NIPS2023] RRHF & Wombat☆811Updated 2 years ago
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆140Updated 2 years ago
- llama fine-tuning with lora☆140Updated last year
- ☆123Updated 2 years ago
- Scripts for fine-tuning Llama2 via SFT and DPO.☆206Updated 2 years ago
- An opensource ChatBot built with ExpertPrompting which achieves 96% of ChatGPT's capability.☆299Updated 2 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆60Updated 2 years ago
- [EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models☆212Updated last year
- Official repository for LongChat and LongEval☆533Updated last year
- Multi-language Enhanced LLaMA☆303Updated 2 years ago
- ☆459Updated last year
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated 2 years ago
- The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1☆273Updated 9 months ago
- ☆330Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆235Updated 3 months ago
- Large Language Models Are Reasoning Teachers (ACL 2023)☆343Updated 9 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆116Updated 2 years ago
- Large language Model fintuning bloom , opt , gpt, gpt2 ,llama,llama-2,cpmant and so on☆99Updated last year
- llama2 finetuning with deepspeed and lora☆175Updated 2 years ago
- ☆923Updated last year
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆838Updated last year
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆354Updated 2 years ago
- Naive Bayes-based Context Extension☆325Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆356Updated last year
- Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning☆405Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆138Updated 7 months ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆225Updated 2 years ago
- Datasets for Instruction Tuning of Large Language Models☆259Updated 2 years ago
- ☆282Updated last year
- A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters…☆208Updated last year