A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Apr 28, 2023Updated 3 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated 2 years ago
- Latex template for CUHK PhD Thesis☆14Jun 29, 2025Updated 10 months ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 9 months ago
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- Python scripts for setting up private LLM's on local and in the cloud with LangChain, GPT4All and Cerebrium☆11May 29, 2023Updated 2 years ago
- Launch machine learning models into production using flask☆15Aug 11, 2022Updated 3 years ago
- ☆11Jun 27, 2019Updated 6 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆845Jul 1, 2024Updated last year
- A pytest plugin to organize and track algorithm visualizations☆18Dec 1, 2024Updated last year
- In-context learning, Fine-Tuning, RLHF on Flan-T5☆13Aug 30, 2023Updated 2 years ago
- MSBD5001 Big Data Computing Projects -- Algorithm Parallelization. Use PySpark APIs to implement DBSCAN algorithm.☆18Aug 14, 2019Updated 6 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Implementation of Wasserstein Generative Adversarial Networks using Tensorflow☆12Jul 25, 2018Updated 7 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- openai-tutorial☆14Mar 5, 2023Updated 3 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Jun 3, 2024Updated last year
- Explains Canadian Bills☆17May 13, 2023Updated 3 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆118Jun 5, 2023Updated 2 years ago
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago
- ☆13Jul 2, 2025Updated 10 months ago
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated 2 years ago
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- A Python wrapper for the ROUGE summarization evaluation package☆14Aug 9, 2017Updated 8 years ago
- Bullseye Polytope Clean-Label Poisoning Attack☆18Nov 5, 2020Updated 5 years ago
- A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electr…☆99Feb 6, 2026Updated 3 months ago
- An evaluation framework for mitigating DNN backdoor attacks using data augmentations☆11Dec 10, 2020Updated 5 years ago
- Applying Deep Reinforcement Learning for dialogue generation. aka chatbot☆13Apr 30, 2017Updated 9 years ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- ☆11Jul 11, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- MTEB: Massive Text Embedding Benchmark☆11Jan 29, 2024Updated 2 years ago
- Official implementation for Text Generation Beyond Discrete Token Sampling☆25Aug 11, 2025Updated 9 months ago
- ☆31Sep 20, 2021Updated 4 years ago
- clip retrieval benchmark☆17May 4, 2022Updated 4 years ago
- 科技金融应用:欺诈风险识别☆16Mar 25, 2023Updated 3 years ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated 2 years ago
- Source code for SummaReranker (ACL 2022)☆24Jan 7, 2024Updated 2 years ago