l294265421 / alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆114Updated last year
Alternatives and similar repositories for alpaca-rlhf:
Users that are interested in alpaca-rlhf are comparing it to the libraries listed below
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆216Updated last year
- 怎么训练一个LLM分词器☆142Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆347Updated 6 months ago
- ☆134Updated 11 months ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆76Updated 4 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning