lucidrains/self-rewarding-lm-pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/self-rewarding-lm-pytorch)

lucidrains / self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

☆1,411

Alternatives and similar repositories for self-rewarding-lm-pytorch

Users that are interested in self-rewarding-lm-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uclaml / SPIN
View on GitHub
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,247May 8, 2024Updated 2 years ago
huggingface / alignment-handbook
View on GitHub
Robust recipes to align language models with human and AI preferences
☆5,639May 26, 2026Updated last month
eric-mitchell / direct-preference-optimization
View on GitHub
Reference implementation for DPO (Direct Preference Optimization)
☆2,896Aug 11, 2024Updated last year
Oxen-AI / Self-Rewarding-Language-Models
View on GitHub
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
☆135Nov 16, 2024Updated last year
arcee-ai / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆7,250Jun 17, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
allenai / reward-bench
View on GitHub
RewardBench: the first evaluation tool for reward models.
☆727Feb 16, 2026Updated 5 months ago
xfactlab / orpo
View on GitHub
Official repository for ORPO
☆480May 31, 2024Updated 2 years ago
ContextualAI / HALOs
View on GitHub
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
☆908Sep 30, 2025Updated 9 months ago
huggingface / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆18,898Updated this week
thomasgauthier / LLM-self-play
View on GitHub
Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)
☆29Mar 1, 2024Updated 2 years ago
Codium-ai / AlphaCodium
View on GitHub
Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""
☆3,948Nov 25, 2024Updated last year
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,248Jul 11, 2024Updated 2 years ago
RLHFlow / RLHF-Reward-Modeling
View on GitHub
Recipes to train reward model for RLHF.
☆1,534Apr 24, 2025Updated last year
nlpxucan / WizardLM
View on GitHub
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
☆9,480Jun 7, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,760May 26, 2026Updated last month
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
jzhang38 / TinyLlama
View on GitHub
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
☆9,015May 3, 2024Updated 2 years ago
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
axolotl-ai-cloud / axolotl
View on GitHub
Go ahead and axolotl questions
☆12,222Updated this week
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,339Updated this week
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,803Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,831Jul 14, 2026Updated last week
tatsu-lab / alpaca_farm
View on GitHub
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
☆845Jul 1, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,217Updated this week
QuixiAI / laserRMT
View on GitHub
This is our own implementation of 'Layer Selective Rank Reduction'
☆240May 26, 2024Updated 2 years ago
allenai / OLMo
View on GitHub
Modeling, training, eval, and inference code for OLMo
☆6,600Nov 24, 2025Updated 7 months ago
AnswerDotAI / fsdp_qlora
View on GitHub
Training LLMs with QLoRA + FSDP
☆1,549Nov 9, 2024Updated last year
LargeWorldModel / LWM
View on GitHub
Large World Model -- Modeling Text and Video with Millions Context
☆7,427Oct 19, 2024Updated last year
jondurbin / bagel
View on GitHub
A bagel, with everything.
☆326Apr 11, 2024Updated 2 years ago
thomasgauthier / LoRD
View on GitHub
Low-Rank adapter extraction for fine-tuned transformers models
☆181May 2, 2024Updated 2 years ago
openai / transformer-debugger
View on GitHub
☆4,117Apr 15, 2026Updated 3 months ago
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,940May 17, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,427Mar 3, 2024Updated 2 years ago
FasterDecoding / Medusa
View on GitHub
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
☆2,757Jun 25, 2024Updated 2 years ago
datamllab / LongLM
View on GitHub
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
☆668Jun 1, 2024Updated 2 years ago
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,868Dec 23, 2025Updated 6 months ago
yuchenlin / LLM-Blender
View on GitHub
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…
☆989Oct 22, 2024Updated last year
schauppi / Self-Rewarding-Language-Models
View on GitHub
☆50May 13, 2024Updated 2 years ago
openai / weak-to-strong
View on GitHub
☆2,549May 19, 2024Updated 2 years ago