A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆60Apr 28, 2023Updated 3 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆138Apr 28, 2023Updated 3 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆220May 20, 2024Updated 2 years ago
- Latex template for CUHK PhD Thesis☆14Jun 29, 2025Updated last year
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆24Nov 16, 2023Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆239Aug 17, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆845Jul 1, 2024Updated 2 years ago
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- MSBD5001 Big Data Computing Projects -- Algorithm Parallelization. Use PySpark APIs to implement DBSCAN algorithm.☆18Aug 14, 2019Updated 6 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 3 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Natural Language Generation by Hierarchical Decoding with Linguistic Patterns (NAACL-HLT 2018), Investigating Linguistic Pattern Ordering…☆32Sep 23, 2018Updated 7 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Jun 3, 2024Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 3 years ago
- 定时爬取arXiv每日论文☆13May 22, 2023Updated 3 years ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago
- ☆13Jul 2, 2025Updated last year
- Code for KDD 2023 long paper: MetricPrompt: Prompting Model as a Relevance Metric for Few-Shot Text Classification☆19Aug 10, 2024Updated last year
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- Bullseye Polytope Clean-Label Poisoning Attack☆18Nov 5, 2020Updated 5 years ago
- This project collects methods that enhance the comparison between AMR graphs.☆11Jun 15, 2023Updated 3 years ago
- A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electr…☆101Feb 6, 2026Updated 4 months ago
- Applying Deep Reinforcement Learning for dialogue generation. aka chatbot☆13Apr 30, 2017Updated 9 years ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Dec 1, 2023Updated 2 years ago
- Very concise example of integrated gradients (a method to reveal areas of attention in input images)☆10Jun 17, 2019Updated 7 years ago
- ☆11Jul 11, 2023Updated 2 years ago
- Summarization with Pointer-Generator Networks☆15Sep 1, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- MTEB: Massive Text Embedding Benchmark☆11Jan 29, 2024Updated 2 years ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆127Jun 26, 2026Updated last week
- A Comprehensive Study Notes on Artificial Intelligence: dedicated to the exploration and understanding of AI concepts, algorithms, and ap…☆20Jan 14, 2026Updated 5 months ago
- ☆15Jan 9, 2026Updated 5 months ago
- ☆31Sep 20, 2021Updated 4 years ago
- clip retrieval benchmark☆17May 4, 2022Updated 4 years ago
- Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.☆10Nov 29, 2023Updated 2 years ago