jackaduma / Vicuna-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
☆215Updated last year
Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch
Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- [NIPS2023] RRHF & Wombat☆809Updated last year
- llama fine-tuning with lora☆139Updated last year
- Official repository for LongChat and LongEval☆518Updated last year
- Naive Bayes-based Context Extension☆326Updated 5 months ago
- Scripts for fine-tuning Llama2 via SFT and DPO.☆200Updated last year
- Multi-language Enhanced LLaMA☆301Updated 2 years ago
- ☆124Updated last year
- All available datasets for Instruction Tuning of Large Language Models☆250Updated last year
- ☆269Updated 2 years ago
- ☆459Updated 11 months ago
- [EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models☆206Updated last year
- ☆361Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆217Updated 2 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆58Updated 2 years ago
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆135Updated 2 years ago
- ☆457Updated last year
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- An opensource ChatBot built with ExpertPrompting which achieves 96% of ChatGPT's capability.☆300Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated 2 years ago
- Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning☆396Updated last year
- Implementation of Toolformer: Language Models Can Teach Themselves to Use Tools☆139Updated 2 years ago
- llama2 finetuning with deepspeed and lora☆174Updated last year
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆547Updated last year
- Due to restriction of LLaMA, we try to reimplement BLOOM-LoRA (much less restricted BLOOM license here https://huggingface.co/spaces/bigs…☆185Updated last year
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆812Updated 11 months ago
- Official codebase for "SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation"☆227Updated 2 years ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆340Updated last year
- Simple next-token-prediction for RLHF☆227Updated last year
- Generative Judge for Evaluating Alignment☆238Updated last year
- [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition☆637Updated 10 months ago