jackaduma / Alpaca-LoRA-RLHF-PyTorchLinks

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

☆58

Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch

Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below

Sorting:

seonghyeonye / TAPP
[AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
☆79Updated 10 months ago
gpt4life / alpagasus
Unofficial implementation of AlpaGasus
☆92Updated last year
yueyu1030 / AttrPrompt
[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.
☆150Updated last year
thu-coai / PICL
Code for ACL2023 paper: Pre-Training to Learn in Context
☆107Updated 11 months ago
i-Eval / FairEval
☆139Updated last year
thunlp / Prompt-Transferability
On Transferability of Prompt Tuning for Natural Language Processing
☆99Updated last year
Re-Align / just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
☆85Updated last year
zguo0525 / Dr.LLaMA
☆56Updated 2 years ago
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆219Updated last year
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆140Updated 2 months ago
kaistAI / CoT-Collection
[EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
☆244Updated last year
Dahoas / reward-modeling
☆96Updated 2 years ago
WadeYin9712 / Dynosaur
Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)
☆64Updated last year
pbhu1024 / awesome-augmented-language-model
⚡Research papers about leveraging the capabilities of language models⚡
☆52Updated 2 years ago
JetRunner / SuperICL
Code for "Small Models are Valuable Plug-ins for Large Language Models"
☆130Updated 2 years ago
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆154Updated last year
bhargaviparanjape / language-programmes
☆172Updated 2 years ago
OSU-NLP-Group / AttrScore
Code, datasets, models for the paper "Automatic Evaluation of Attribution by Large Language Models"
☆56Updated 2 years ago
Raibows / Learn-to-Reason
Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023
☆35Updated last year
mzbac / llama2-fine-tune
Scripts for fine-tuning Llama2 via SFT and DPO.
☆200Updated last year
Anni-Zou / Meta-CoT
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models
☆97Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆81Updated 11 months ago
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆162Updated 2 months ago
night-chen / ToolQA
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆272Updated last year
RUCAIBox / ChatCoT
The official repository of "ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models"
☆44Updated 2 years ago
facebookresearch / perfect
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
☆109Updated 3 years ago
akoksal / LongForm
Reverse Instructions to generate instruction tuning data with corpus examples
☆214Updated last year
jackaduma / Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…
☆218Updated last year
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated last year
abhika-m / FAVA
☆72Updated last year