A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Apr 28, 2023Updated 3 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- Latex template for CUHK PhD Thesis☆13Jun 29, 2025Updated 10 months ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Python scripts for setting up private LLM's on local and in the cloud with LangChain, GPT4All and Cerebrium☆11May 29, 2023Updated 2 years ago
- ☆12May 22, 2024Updated last year
- ☆10Oct 4, 2024Updated last year
- ☆13Sep 20, 2023Updated 2 years ago
- ☆11Jun 27, 2019Updated 6 years ago
- In-context learning, Fine-Tuning, RLHF on Flan-T5☆13Aug 30, 2023Updated 2 years ago
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 3 years ago
- Implementation of Wasserstein Generative Adversarial Networks using Tensorflow☆12Jul 25, 2018Updated 7 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- openai-tutorial☆14Mar 5, 2023Updated 3 years ago
- Natural Language Generation by Hierarchical Decoding with Linguistic Patterns (NAACL-HLT 2018), Investigating Linguistic Pattern Ordering…☆32Sep 23, 2018Updated 7 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Jun 3, 2024Updated last year
- Explains Canadian Bills☆17May 13, 2023Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆118Jun 5, 2023Updated 2 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago
- ☆13Jul 2, 2025Updated 10 months ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Feb 24, 2024Updated 2 years ago
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated 2 years ago
- A Tutorial on RAG and Fine-Tuning LLMs☆14Nov 27, 2023Updated 2 years ago
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electr…☆99Feb 6, 2026Updated 2 months ago
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆91Nov 23, 2022Updated 3 years ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated 2 years ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- ☆11Jul 11, 2023Updated 2 years ago
- MTEB: Massive Text Embedding Benchmark☆11Jan 29, 2024Updated 2 years ago
- ☆12Oct 14, 2022Updated 3 years ago
- Langchain_CrewAI_Gemini - An Gemini AI powered AI Agent (Multi-Agent) Project.☆14Mar 24, 2024Updated 2 years ago