jackaduma / Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
β58Updated last year
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch:
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.β82Updated last year
- π An unofficial implementation of Self-Alignment with Instruction Backtranslation.β136Updated 7 months ago
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Followingβ79Updated 4 months ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate tβ¦β69Updated 11 months ago
- Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"β164Updated 3 years ago
- β96Updated last year
- Code for "Small Models are Valuable Plug-ins for Large Language Models"β126Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Contextβ108Updated 6 months ago
- β137Updated last year
- Self-Alignment with Principle-Following Reward Modelsβ152Updated 11 months ago
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"β65Updated 5 months ago
- Unofficial implementation of AlpaGasusβ90Updated last year
- Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023β31Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)β63Updated last year
- β‘Research papers about leveraging the capabilities of language modelsβ‘β52Updated last year
- Counting-Stars (β )β78Updated 5 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Modelsβ90Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"β50Updated 7 months ago
- β48Updated 10 months ago
- "TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]β28Updated last month
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.β128Updated last year
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?β56Updated last year
- Code and Data Repo for [ACL 2023] Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"β53Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.β50Updated last year
- β39Updated last year
- [ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationalesβ66Updated 2 months ago
- The official repository of "ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models"β42Updated last year
- β94Updated 4 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"β58Updated 11 months ago
- This repository contains the code to train flan t5 with alpaca instructions and low rank adaptation.β48Updated last year