ashishjamarkattel / reinforment-learning-with-human-feedbackLinks
☆17Updated last year
Alternatives and similar repositories for reinforment-learning-with-human-feedback
Users that are interested in reinforment-learning-with-human-feedback are comparing it to the libraries listed below
Sorting:
- Notes and commented code for RLHF (PPO)☆116Updated last year
- ☆81Updated last year
- Master the essential steps of pretraining large language models (LLMs). Learn to create high-quality datasets, configure model architectu…☆24Updated last year
- Tutorial for how to build BERT from scratch☆100Updated last year
- Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI m…☆225Updated 2 years ago
- Building a 2.3M-parameter LLM from scratch with LLaMA 1 architecture.☆191Updated last year
- Instruct-tune Open LLaMA / RedPajama / StableLM models on consumer hardware using QLoRA☆81Updated last year
- LORA: Low-Rank Adaptation of Large Language Models implemented using PyTorch☆117Updated 2 years ago
- Reinforcement Learning using PyTorch☆11Updated last year
- minimal GRPO implementation from scratch☆99Updated 8 months ago
- A simplified LLAMA implementation for training and inference tasks.☆33Updated 4 months ago
- LLaMA 3 is one of the most promising open-source model after Mistral, we will recreate it's architecture in a simpler manner.☆190Updated last year
- Distributed training (multi-node) of a Transformer model☆86Updated last year
- Apply LLMs to your data, build personal assistants, and expand your use of LLMs with agents, chains, and memories.☆132Updated 2 months ago
- Scripts for fine-tuning Llama2 via SFT and DPO.☆205Updated 2 years ago
- A minimum example of aligning language models with RLHF similar to ChatGPT☆224Updated 2 years ago
- ☆20Updated 4 years ago
- I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment☆17Updated 10 months ago
- ☆55Updated 2 months ago
- Implementation of Reinforcement Learning from Human Feedback (RLHF)☆173Updated 2 years ago
- ☆189Updated last year
- Building LLaMA 4 MoE from Scratch☆68Updated 7 months ago
- This repository contains a custom implementation of the BERT model, fine-tuned for specific tasks, along with an implementation of Low Ra…☆78Updated 2 years ago
- Resources relating to the DLAI event: https://www.youtube.com/watch?v=eTieetk2dSw☆188Updated 2 years ago
- ☆78Updated 2 years ago
- It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) i…☆66Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆78Updated last year
- a simplified version of Meta's Llama 3 model to be used for learning☆43Updated last year
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆233Updated 3 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆125Updated 6 months ago