allenai/RL4LMs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/RL4LMs)

allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences

☆2,393

Alternatives and similar repositories for RL4LMs

Users that are interested in RL4LMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,752Jan 8, 2024Updated 2 years ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,878Updated this week
openai / lm-human-preferences
View on GitHub
Code for the paper Fine-Tuning Language Models from Human Preferences
☆1,393Jul 25, 2023Updated 2 years ago
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,851Jun 17, 2025Updated last year
opendilab / awesome-RLHF
View on GitHub
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,413May 20, 2026Updated last month
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
lucidrains / PaLM-rlhf-pytorch
View on GitHub
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
☆7,865May 29, 2026Updated last month
voidful / TextRL
View on GitHub
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-…
☆564Apr 23, 2026Updated 2 months ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,611Nov 24, 2025Updated 7 months ago
eric-mitchell / direct-preference-optimization
View on GitHub
Reference implementation for DPO (Direct Preference Optimization)
☆2,896Aug 11, 2024Updated last year
openai / summarize-from-feedback
View on GitHub
Code for "Learning to summarize from human feedback"
☆1,062Sep 5, 2023Updated 2 years ago
yizhongw / self-instruct
View on GitHub
Aligning pretrained language models with instruction data generated by themselves.
☆4,606Mar 27, 2023Updated 3 years ago
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,427Mar 3, 2024Updated 2 years ago
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,415Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,821Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
tatsu-lab / alpaca_farm
View on GitHub
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
☆845Jul 1, 2024Updated 2 years ago
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,639May 26, 2026Updated last month
GanjinZero / RRHF
View on GitHub
[NIPS2023] RRHF & Wombat
☆806Sep 22, 2023Updated 2 years ago
Sea-Snell / Implicit-Language-Q-Learning
View on GitHub
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆212Jul 31, 2023Updated 2 years ago
google-research / FLAN
View on GitHub
☆1,565Jul 2, 2026Updated 2 weeks ago
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
CarperAI / cheese
View on GitHub
Used for adaptive human in the loop evaluation of language and embedding models.
☆306Mar 1, 2023Updated 3 years ago
yxuansu / SimCTG
View on GitHub
[NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation
☆478Mar 7, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,150Jun 1, 2023Updated 3 years ago
Instruction-Tuning-with-GPT-4 / GPT-4-LLM
View on GitHub
Instruction Tuning with GPT-4
☆4,331Jun 11, 2023Updated 3 years ago
tatsu-lab / stanford_alpaca
View on GitHub
Code and documentation to train Stanford's Alpaca models, and generate the data.
☆30,252Jul 17, 2024Updated 2 years ago
EleutherAI / lm-evaluation-harness
View on GitHub
A framework for few-shot evaluation of language models.
☆13,336Updated this week
microsoft / unilm
View on GitHub
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
☆22,165Jan 23, 2026Updated 5 months ago
princeton-nlp / SimCSE
View on GitHub
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
☆3,655Oct 16, 2024Updated last year
deepspeedai / DeepSpeedExamples
View on GitHub
Example models using DeepSpeed
☆6,831Updated this week
bigscience-workshop / promptsource
View on GitHub
Toolkit for creating, sharing and using natural language prompts.
☆3,027Oct 23, 2023Updated 2 years ago
microsoft / LMOps
View on GitHub
General technology for enabling AI capabilities w/ LLMs and MLLMs
☆4,437Jun 17, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
adapter-hub / adapters
View on GitHub
A Unified Library for Parameter-Efficient and Modular Transfer Learning
☆2,822Apr 26, 2026Updated 2 months ago
google / BIG-bench
View on GitHub
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
☆3,248Jul 19, 2024Updated 2 years ago
facebookresearch / metaseq
View on GitHub
Repo for external large-scale work
☆6,549Apr 27, 2024Updated 2 years ago
bitsandbytes-foundation / bitsandbytes
View on GitHub
Accessible large language models via k-bit quantization for PyTorch.
☆8,333Updated this week
OpenGVLab / LLaMA-Adapter
View on GitHub
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
☆5,916Mar 14, 2024Updated 2 years ago
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,108Updated this week
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,800Updated this week