jackaduma / Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆56Updated last year
Related projects ⓘ
Alternatives and complementary repositories for Alpaca-LoRA-RLHF-PyTorch
- ☆94Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 2 months ago
- Unofficial implementation of AlpaGasus☆84Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆62Updated 11 months ago
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆62Updated 3 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆83Updated 4 months ago
- ☆133Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆132Updated 4 months ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆48Updated 5 months ago
- ☆56Updated 9 months ago
- Code and Data Repo for ACL'23 Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"☆53Updated 10 months ago
- ☆101Updated 5 months ago
- ☆88Updated last month
- Code for ACL2023 paper: Pre-Training to Learn in Context☆107Updated 3 months ago
- Code for "Small Models are Valuable Plug-ins for Large Language Models"☆122Updated last year
- Counting-Stars (★)☆76Updated 2 months ago
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning☆86Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆56Updated 8 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆125Updated 2 months ago
- ☆103Updated last year
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆141Updated last year
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆46Updated 8 months ago
- ☆69Updated last year
- On Transferability of Prompt Tuning for Natural Language Processing☆97Updated 6 months ago
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆91Updated last year
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆213Updated last year
- Self-Alignment with Principle-Following Reward Models☆147Updated 8 months ago
- ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆62Updated 7 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆118Updated 4 months ago