huggingface / trlLinks
Train transformer language models with reinforcement learning.
β17,115Updated this week
Alternatives and similar repositories for trl
Users that are interested in trl are comparing it to the libraries listed below
Sorting:
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,502Updated this week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"β13,199Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMsβ10,822Updated last year
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,461Updated last week
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)β8,851Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β7,912Updated this week
- A framework for few-shot evaluation of language models.β11,246Updated last week
- verl: Volcano Engine Reinforcement Learning for LLMsβ18,535Updated last week
- Fast and memory-efficient exact attentionβ21,773Updated this week
- PyTorch native post-training libraryβ5,654Updated this week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β4,738Updated 2 years ago
- Ongoing research training transformer models at scaleβ15,016Updated this week
- Robust recipes to align language models with human and AI preferencesβ5,481Updated 4 months ago
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We alsβ¦β18,167Updated 2 months ago
- Tools for merging pretrained large language models.β6,696Updated 3 weeks ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.β13,110Updated last week
- Reference implementation for DPO (Direct Preference Optimization)β2,832Updated last year
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.β7,544Updated this week
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adβ¦β6,093Updated 6 months ago
- Aligning pretrained language models with instruction data generated by themselves.β4,564Updated 2 years ago
- [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parametersβ5,936Updated last year
- Retrieval and Retrieval-augmented LLMsβ11,187Updated last month
- Example models using DeepSpeedβ6,777Updated last month
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.β8,880Updated last year
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, β¦β6,588Updated last week
- General technology for enabling AI capabilities w/ LLMs and MLLMsβ4,262Updated last month
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (β¦β12,337Updated this week
- AllenAI's post-training codebaseβ3,538Updated this week
- Modeling, training, eval, and inference code for OLMoβ6,294Updated 2 months ago
- A curated list of reinforcement learning with human feedback resources (continually updated)β4,275Updated last month