jackaduma / Alpaca-LoRA-RLHF-PyTorchLinks
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated last year
- Code for ACL2023 paper: Pre-Training to Learn in Context☆106Updated last year
- On Transferability of Prompt Tuning for Natural Language Processing☆100Updated last year
- ☆98Updated 2 years ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Updated last year
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆90Updated 2 years ago
- ☆144Updated 2 years ago
- ⚡Research papers about leveraging the capabilities of language models⚡☆52Updated 3 weeks ago
- This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.☆73Updated 3 years ago
- Code for "Small Models are Valuable Plug-ins for Large Language Models"☆132Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆169Updated 4 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆137Updated 9 months ago
- PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models☆111Updated last month
- [NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.☆156Updated 2 years ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆172Updated last year
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆222Updated 2 years ago
- Unofficial implementation of AlpaGasus☆94Updated 2 years ago
- ☆75Updated last year
- Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.☆64Updated last year
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆64Updated 2 years ago
- Simple next-token-prediction for RLHF☆228Updated 2 years ago
- [ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM☆190Updated last year
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Updated 2 years ago
- Contrastive Chain-of-Thought Prompting☆68Updated 2 years ago
- About The corresponding code from our paper " REFINER: Reasoning Feedback on Intermediate Representations" (EACL 2024). Do not hesitate t…☆74Updated 2 weeks ago
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆254Updated 2 years ago
- ☆173Updated 2 years ago
- ⏳ ChatLog: Recording and Analysing ChatGPT Across Time☆103Updated last year
- A framework for human-readable prompt-based method with large language models. Specially designed for researchers. (Deprecated, check out…☆131Updated 2 years ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆85Updated last year