A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
☆61Apr 28, 2023Updated 2 years ago
Alternatives and similar repositories for Alpaca-LoRA-RLHF-PyTorch
Users that are interested in Alpaca-LoRA-RLHF-PyTorch are comparing it to the libraries listed below
Sorting:
- A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…☆140Apr 28, 2023Updated 2 years ago
- A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human…☆221May 20, 2024Updated last year
- Latex template for CUHK PhD Thesis☆11Jun 29, 2025Updated 8 months ago
- ☆12May 22, 2024Updated last year
- This repository provides installation scripts and configuration files for deploying the CSGHub instance, includes Helm charts and Docker…☆18Updated this week
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Jan 16, 2025Updated last year
- openai-tutorial☆15Mar 5, 2023Updated 2 years ago
- CMU-OAQA LiveQA system☆19Apr 7, 2016Updated 9 years ago
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆90Nov 23, 2022Updated 3 years ago
- Alpaca-lora for huggingface implementation using Deepspeed and FullyShardedDataParallel☆24Apr 3, 2023Updated 2 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"☆20Feb 24, 2024Updated 2 years ago
- Download, parse, and filter data PubMed, data-ready for The-Pile☆23Dec 16, 2021Updated 4 years ago
- Code for the SIGIR 2020 paper "A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss"☆21Feb 3, 2021Updated 5 years ago
- nlp_interview notes and answers: 该仓库主要记录 NLP 算法工程师相关的面试题和参考答案☆23Nov 16, 2023Updated 2 years ago
- A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.☆842Jul 1, 2024Updated last year
- ☆84Dec 16, 2025Updated 2 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- replicantlife is a framework for generative agents that can be used in a simulation engine or standalone. Agents are powered with metacog…☆34Apr 25, 2024Updated last year
- A first cut into exploring the use of dependency links for building Text Graphs, that, among other things, with help of a centrality algo…☆32Oct 20, 2023Updated 2 years ago
- An open-source session replay tool for single-page applications that uses AI analysis, aggregated trends, and a RAG chatbot to help devel…☆11Jan 23, 2026Updated last month
- a tiny, exploitable chatbot that can use tools☆32Apr 5, 2023Updated 2 years ago
- Code for our CIKM 2019 paper. As far as we know, CONVEX is the first unsupervised method for conversational question answering over knowl…☆28Sep 3, 2020Updated 5 years ago
- Launch machine learning models into production using flask☆12Aug 11, 2022Updated 3 years ago
- Web interface for building and managing your own agentic record label.☆10Updated this week
- Use MobileNet SSD and openCV to detect and count car on road☆12Jan 13, 2020Updated 6 years ago
- Automatic defect recognition in X-ray testing using computer vision☆12Dec 8, 2018Updated 7 years ago
- Code for "Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation", ICCV 2023☆16Aug 31, 2023Updated 2 years ago
- RL algorithm for stock trading with multiple reward functions☆11Apr 21, 2024Updated last year
- Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks☆38May 25, 2021Updated 4 years ago
- Simplifies data migration between Apache Ignite clusters by relying on Apache Avro as an intermediate storage format☆13Jun 27, 2023Updated 2 years ago
- 稚晖君电子Esp32脱机版☆11Jan 15, 2025Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆43Mar 14, 2024Updated last year
- This is a list used to collect the available (open-source / closed-source) projects that comply with Google Agent2Agent.☆13Apr 24, 2025Updated 10 months ago
- MATLAB/Octave generator of Hamming ECC coding. Output format is Verilog HDL.☆12Dec 27, 2022Updated 3 years ago
- 🕹 Pikachu-volleyball game-based multi-agent RL environment using PettingZoo☆11Sep 29, 2024Updated last year
- Support for training SSD on TF2☆12Mar 29, 2023Updated 2 years ago
- chatito generate nlu data☆38Sep 25, 2018Updated 7 years ago
- Code to reproduce "GPT-too: A Language-Model-First Approach for AMR-to-Text-Generation"☆38Sep 17, 2025Updated 5 months ago