Joyce94 / LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
☆374Updated last year
Related projects ⓘ
Alternatives and complementary repositories for LLM-RLHF-Tuning
- HugNLP is a unified and comprehensive NLP library based on HuggingFace Transformer. Please hugging for NLP now!😊 HugNLP will released to…☆249Updated last year
- [TACL 2024] MAPS enables LLMs🤖 to mimic the human😁 translation process.☆133Updated 5 months ago
- CIKM2023 Best Demo Paper Award. HugNLP is a unified and comprehensive NLP library based on HuggingFace Transformer. Please hugging for NL…☆382Updated last year
- Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"☆131Updated 7 months ago
- [ACL 2024] RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback.☆182Updated 2 months ago
- ChatGLM-6B HTTP流式解码API的Flask、FastAPI实现,以及开箱即用的Web页面。 a stream decoding demo of ChatGLM-6B using Flask or FastAPI, with web page out-of-th…☆93Updated 9 months ago
- ☆585Updated 3 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆107Updated last year
- Efficient, Low-Resource, Distributed transformer implementation based on BMTrain☆243Updated 11 months ago
- llama2 finetuning with deepspeed and lora☆167Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆308Updated 2 months ago
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆126Updated last year
- 大模型多维度中文对齐评测基准 (ACL 2024)☆334Updated 3 months ago
- 每个人都能看懂的大模型知识分享,LLMs秋招大模型面试前必看,让你和面试官侃侃而谈☆215Updated this week
- ☆64Updated last year
- The official repository of our survey paper: "Towards a Unified View of Preference Learning for Large Language Models: A Survey"☆154Updated 3 weeks ago
- The related works and background techniques about Openai o1☆144Updated 2 weeks ago
- Collaborative Training of Large Language Models in an Efficient Way☆411Updated 2 months ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆185Updated last year
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆203Updated 11 months ago
- Awesome papers for role-playing with language models☆125Updated 2 weeks ago
- ☆85Updated last week
- 大语言模型指令调优工具(支持 FlashAttention)☆166Updated 10 months ago
- ☆120Updated 7 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆218Updated last year
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆325Updated last month
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆243Updated last year
- 使用单个24G显卡,从0开始训练LLM☆49Updated last month
- Toolkit for Prompt Compression☆245Updated last month
- deepspeed+trainer简单高效实现多卡微调大模型☆116Updated last year