jackaduma / Alpaca-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆86Updated last year
- Unofficial implementation of AlpaGasus☆92Updated last year
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆153Updated last year
- ☆172Updated 2 years ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆247Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆107Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆78Updated 11 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆139Updated 3 months ago
- ☆74Updated last year
- Contrastive Chain-of-Thought Prompting☆68Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆64Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆219Updated 2 years ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Updated 11 months ago
- On Transferability of Prompt Tuning for Natural Language Processing☆99Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆159Updated last year
- Self-Alignment with Principle-Following Reward Models☆165Updated 3 months ago
- ☆140Updated last year
- Scripts for fine-tuning Llama2 via SFT and DPO.☆203Updated 2 years ago
- We have released the code and demo program required for LLM with self-verification☆61Updated last year
- Counting-Stars (★)☆83Updated 2 months ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆114Updated 2 years ago
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆64Updated 9 months ago
- Code for "Small Models are Valuable Plug-ins for Large Language Models"☆131Updated 2 years ago
- ⚡Research papers about leveraging the capabilities of language models⚡☆52Updated 2 years ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆188Updated last year
- Source code for the paper "Active Prompting with Chain-of-Thought for Large Language Models"☆243Updated last year
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆177Updated last year
- Code for "Democratizing Reasoning Ability: Tailored Learning from Large Language Model", EMNLP 2023☆36Updated last year
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆162Updated last year