jackaduma / Alpaca-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆59Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆219Updated 2 years ago
- Unofficial implementation of AlpaGasus☆92Updated last year
- Self-Alignment with Principle-Following Reward Models☆163Updated 3 months ago
- On Transferability of Prompt Tuning for Natural Language Processing☆99Updated last year
- ☆96Updated 2 years ago
- ☆140Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆107Updated last year
- A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on…☆141Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆140Updated 3 months ago
- ☆172Updated 2 years ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆85Updated last year
- Code for "Small Models are Valuable Plug-ins for Large Language Models"☆131Updated 2 years ago
- Reverse Instructions to generate instruction tuning data with corpus examples☆214Updated last year
- PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models☆109Updated 3 years ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated 10 months ago
- ☆180Updated 2 years ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆245Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆64Updated last year
- [ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM☆186Updated 6 months ago
- Simple next-token-prediction for RLHF☆227Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆157Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆187Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆274Updated last year
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆70Updated last year
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆153Updated last year
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆56Updated 2 years ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆69Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆64Updated 8 months ago