jackaduma / Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆58Updated last year
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch:
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
- ☆97Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆85Updated last year
- Unofficial implementation of AlpaGasus☆90Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆139Updated 10 months ago
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 7 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆158Updated 11 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆130Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆108Updated 8 months ago
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆63Updated last year
- ☆138Updated last year
- ☆172Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated last year
- ☆64Updated 2 years ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆238Updated last year
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆110Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆45Updated 4 months ago
- [IJCAI 2024] FactCHD: Benchmarking Fact-Conflicting Hallucination Detection☆87Updated 11 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆70Updated last year
- ☆57Updated last year
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆150Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆139Updated 5 months ago
- All available datasets for Instruction Tuning of Large Language Models☆248Updated last year
- Code for "Small Models are Valuable Plug-ins for Large Language Models"☆129Updated last year
- Scripts for fine-tuning Llama2 via SFT and DPO.☆197Updated last year
- A Multi-Turn Dialogue Corpus based on Alpaca Instructions☆169Updated last year
- Plug in and play implementation of " Textbooks Are All You Need", ready for training, inference, and dataset generation☆76Updated last year
- Self-Alignment with Principle-Following Reward Models☆160Updated last year
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆99Updated 2 years ago
- Simple next-token-prediction for RLHF☆225Updated last year
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆56Updated 2 years ago