A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Apr 28, 2023Updated 3 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆138Apr 28, 2023Updated 3 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated 2 years ago
- Latex template for CUHK PhD Thesis☆14Jun 29, 2025Updated 11 months ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相 关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- Python scripts for setting up private LLM's on local and in the cloud with LangChain, GPT4All and Cerebrium☆11May 29, 2023Updated 3 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆844Jul 1, 2024Updated last year
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- MSBD5001 Big Data Computing Projects -- Algorithm Parallelization. Use PySpark APIs to implement DBSCAN algorithm.☆18Aug 14, 2019Updated 6 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 4 years ago
- Implementation of Wasserstein Generative Adversarial Networks using Tensorflow☆12Jul 25, 2018Updated 7 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- This repo contains data and code for the paper "Reasoning over Public and Private Data in Retrieval-Based Systems."☆46Jul 18, 2024Updated last year
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- openai-tutorial☆14Mar 5, 2023Updated 3 years ago
- Natural Language Generation by Hierarchical Decoding with Linguistic Patterns (NAACL-HLT 2018), Investigating Linguistic Pattern Ordering…☆32Sep 23, 2018Updated 7 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆55Jun 3, 2024Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 3 years ago
- ☆10Jul 21, 2017Updated 8 years ago
- 定时爬取arXiv每日论文☆13May 22, 2023Updated 3 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago
- UDT: UDP-based Data Transfer Protocol☆11Apr 21, 2018Updated 8 years ago
- ☆13Jul 2, 2025Updated 11 months ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Feb 24, 2024Updated 2 years ago
- Developing, training, and assessing the performance of a Proximal Policy Optimization (PPO) Stock Trading Agent.☆14Aug 20, 2025Updated 9 months ago
- Code for the paper "Machine-Guided Discovery of a Real-World Rogue Wave Model" (Häfner et al., 2023)☆14Sep 25, 2023Updated 2 years ago
- Organic Electronic Device Simulator☆16Dec 14, 2019Updated 6 years ago
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated 2 years ago
- A Tutorial on RAG and Fine-Tuning LLMs☆15Nov 27, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation for "Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition" (I…☆42May 15, 2023Updated 3 years ago
- This project collects methods that enhance the comparison between AMR graphs.☆11Jun 15, 2023Updated 2 years ago
- An evaluation framework for mitigating DNN backdoor attacks using data augmentations☆11Dec 10, 2020Updated 5 years ago
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- A simple example for finetuning HuggingFace T5 model. Includes code for intermediate generation.☆26Nov 11, 2020Updated 5 years ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆91Nov 23, 2022Updated 3 years ago
- Applying Deep Reinforcement Learning for dialogue generation. aka chatbot☆13Apr 30, 2017Updated 9 years ago