jackaduma / Vicuna-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
☆219Updated last year
Alternatives and similar repositories for Vicuna-LoRA-RLHF-PyTorch
Users that are interested in Vicuna-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆137Updated 2 years ago
- llama fine-tuning with lora☆139Updated last year
- ☆124Updated last year
- [NIPS2023] RRHF & Wombat☆811Updated last year
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- Scripts for fine-tuning Llama2 via SFT and DPO.☆203Updated last year
- Large language Model fintuning bloom , opt , gpt, gpt2 ,llama,llama-2,cpmant and so on☆97Updated last year
- Implementation of Toolformer: Language Models Can Teach Themselves to Use Tools☆140Updated 2 years ago
- ☆459Updated last year
- Large Language Models Are Reasoning Teachers (ACL 2023)☆341Updated 5 months ago
- ☆324Updated last year
- [EMNLP 2023] Lion: Adversarial Distillation of Proprietary Large Language Models☆210Updated last year
- Multi-language Enhanced LLaMA☆301Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Updated last year
- Naive Bayes-based Context Extension☆325Updated 8 months ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆220Updated 2 years ago
- Official repository for LongChat and LongEval☆524Updated last year
- An opensource ChatBot built with ExpertPrompting which achieves 96% of ChatGPT's capability.☆300Updated 2 years ago
- Due to restriction of LLaMA, we try to reimplement BLOOM-LoRA (much less restricted BLOOM license here https://huggingface.co/spaces/bigs…☆184Updated 2 years ago
- The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1☆266Updated 4 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆114Updated 2 years ago
- A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆59Updated 2 years ago
- deep learning☆148Updated 3 months ago
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆352Updated 2 years ago
- All available datasets for Instruction Tuning of Large Language Models☆255Updated last year
- A Multi-Turn Dialogue Corpus based on Alpaca Instructions☆173Updated 2 years ago
- llama2 finetuning with deepspeed and lora☆176Updated 2 years ago
- ☆367Updated 2 years ago
- ☆280Updated last year
- A self-ailgnment method for role-play. Benchmark for role-play. Resources for "Large Language Models are Superpositions of All Characters…☆203Updated last year