Dahoas / reward-modelingLinks

☆98

Alternatives and similar repositories for reward-modeling

Users that are interested in reward-modeling are comparing it to the libraries listed below

Sorting:

tianjunz / HIR
☆159Updated 2 years ago
orhonovich / unnatural-instructions
☆180Updated 2 years ago
thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆87Updated 3 years ago
seonghyeonye / TAPP
[AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
☆78Updated last year
LAION-AI / Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
☆209Updated last year
CarperAI / autocrit
A repository for transformer critique learning and generation
☆89Updated last year
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated 2 years ago
Langboat / mengzi-retrieval-lm
An experimental implementation of the retrieval-enhanced language model
☆75Updated 2 years ago
AI21Labs / Parallel-Context-Windows
☆105Updated 2 years ago
tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆180Updated last year
xrsrke / instructGOOSE
Implementation of Reinforcement Learning from Human Feedback (RLHF)
☆173Updated 2 years ago
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆219Updated 2 years ago
thu-coai / PICL
Code for ACL2023 paper: Pre-Training to Learn in Context
☆106Updated last year
facebookresearch / NPM
The original implementation of Min et al. "Nonparametric Masked Language Modeling" (paper https//arxiv.org/abs/2212.01349)
☆158Updated 2 years ago
yizhongw / Tk-Instruct
Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
☆182Updated 3 years ago
gmftbyGMFTBY / Copyisallyouneed
[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM
☆189Updated 10 months ago
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆266Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated 2 months ago
allenai / FineGrainedRLHF
☆280Updated 10 months ago
akoksal / LongForm
Reverse Instructions to generate instruction tuning data with corpus examples
☆216Updated last year
FreedomIntelligence / MultilingualSIFT
MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning
☆94Updated 2 years ago
gpt4life / alpagasus
Unofficial implementation of AlpaGasus
☆93Updated 2 years ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆60Updated 10 months ago
yxuansu / Contrastive_Search_Is_What_You_Need
[TMLR'23] Contrastive Search Is What You Need For Neural Text Generation
☆121Updated 2 years ago
raunak-agarwal / instruction-datasets
Datasets for Instruction Tuning of Large Language Models
☆259Updated 2 years ago
GanjinZero / math401-llm
Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?
☆57Updated 2 years ago
SimengSun / alpaca_farm_lora
☆22Updated 2 years ago
princeton-nlp / Collie
[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
☆57Updated 2 years ago
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆356Updated last year
booydar / LM-RMT
Recurrent Memory Transformer
☆154Updated 2 years ago