A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆61Apr 28, 2023Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- AI driven Web Application Firewall☆32Dec 12, 2022Updated 3 years ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- 大型中文道德句数据集CMOS☆10Apr 11, 2022Updated 3 years ago
- Python scripts for setting up private LLM's on local and in the cloud with LangChain, GPT4All and Cerebrium☆11May 29, 2023Updated 2 years ago
- Launch machine learning models into production using flask☆13Aug 11, 2022Updated 3 years ago
- ☆11Jun 27, 2019Updated 6 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆842Jul 1, 2024Updated last year
- In-context learning, Fine-Tuning, RLHF on Flan-T5☆13Aug 30, 2023Updated 2 years ago
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 2 years ago
- Implementation of Wasserstein Generative Adversarial Networks using Tensorflow☆12Jul 25, 2018Updated 7 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- openai-tutorial☆15Mar 5, 2023Updated 3 years ago
- Natural Language Generation by Hierarchical Decoding with Linguistic Patterns (NAACL-HLT 2018), Investigating Linguistic Pattern Ordering…☆32Sep 23, 2018Updated 7 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Jun 3, 2024Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- Explains Canadian Bills☆17May 13, 2023Updated 2 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Feb 24, 2024Updated 2 years ago
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- Showcase app for Theming (Light Theme)☆18Mar 22, 2023Updated 3 years ago
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- A Python wrapper for the ROUGE summarization evaluation package☆14Aug 9, 2017Updated 8 years ago
- A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electr…☆97Feb 6, 2026Updated last month
- replicantlife is a framework for generative agents that can be used in a simulation engine or standalone. Agents are powered with metacog…☆34Apr 25, 2024Updated last year
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆90Nov 23, 2022Updated 3 years ago
- Applying Deep Reinforcement Learning for dialogue generation. aka chatbot☆13Apr 30, 2017Updated 8 years ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- ☆11Jul 11, 2023Updated 2 years ago
- Very concise example of integrated gradients (a method to reveal areas of attention in input images)☆10Jun 17, 2019Updated 6 years ago