A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆61Apr 28, 2023Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆139Apr 28, 2023Updated 2 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- Latex template for CUHK PhD Thesis☆12Jun 29, 2025Updated 9 months ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关 的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- [EMNLP 2021] PyTorch Implementation of Contrastive Domain Adaptation for Question Answering using Limited Text Corpora☆14Jul 4, 2023Updated 2 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆843Jul 1, 2024Updated last year
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Jun 6, 2022Updated 3 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Jun 3, 2024Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆118Jun 5, 2023Updated 2 years ago
- ☆11Apr 21, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- ☆13Jul 2, 2025Updated 9 months ago
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated 2 years ago
- A Tutorial on RAG and Fine-Tuning LLMs☆14Nov 27, 2023Updated 2 years ago
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- Minimal example of a OpenAI chat clone written in Streamlit with SOTA features.☆25Jul 11, 2023Updated 2 years ago
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- replicantlife is a framework for generative agents that can be used in a simulation engine or standalone. Agents are powered with metacog…☆34Apr 25, 2024Updated last year
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆91Nov 23, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated 2 years ago
- A casual and simple ChatGPT Python script that can run using terminal (as long as you have an API). Support Azure API.☆21May 3, 2025Updated 11 months ago
- Very concise example of integrated gradients (a method to reveal areas of attention in input images)☆10Jun 17, 2019Updated 6 years ago
- Summarization with Pointer-Generator Networks☆15Sep 1, 2020Updated 5 years ago
- MTEB: Massive Text Embedding Benchmark☆11Jan 29, 2024Updated 2 years ago
- ☆12Oct 14, 2022Updated 3 years ago
- Langchain_CrewAI_Gemini - An Gemini AI powered AI Agent (Multi-Agent) Project.☆14Mar 24, 2024Updated 2 years ago
- A Comprehensive Study Notes on Artificial Intelligence: dedicated to the exploration and understanding of AI concepts, algorithms, and ap…☆20Jan 14, 2026Updated 3 months ago
- clip retrieval benchmark☆17May 4, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Source code for SummaReranker (ACL 2022)☆25Jan 7, 2024Updated 2 years ago
- 科技金融应用:欺诈风险识别☆16Mar 25, 2023Updated 3 years ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated 2 years ago
- Large Language-and-Vision Assistant for BioMedicine, built towards multimodal GPT-4 level capabilities.☆10Nov 29, 2023Updated 2 years ago
- Official Implementation of "GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution"☆20Apr 3, 2024Updated 2 years ago
- ☆15Nov 24, 2020Updated 5 years ago
- ☆11May 11, 2022Updated 3 years ago