allenai / FineGrainedRLHF
View external linksLinks

☆282

Alternatives and similar repositories for FineGrainedRLHF

Users that are interested in FineGrainedRLHF are comparing it to the libraries listed below

Sorting:

OpenBMB / UltraFeedback
View on GitHub
A large-scale, fine-grained, diverse preference dataset (and models).
☆361Dec 29, 2023Updated 2 years ago
IBM / SALMON
View on GitHub
Self-Alignment with Principle-Following Reward Models
☆169Sep 18, 2025Updated 4 months ago
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,416Mar 3, 2024Updated last year
GAIR-NLP / auto-j
View on GitHub
Generative Judge for Evaluating Alignment
☆250Jan 18, 2024Updated 2 years ago
shmsw25 / FActScore
View on GitHub
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆415Apr 13, 2025Updated 10 months ago
feyzaakyurek / rl4f
View on GitHub
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Nov 27, 2024Updated last year
allenai / reward-bench
View on GitHub
RewardBench: the first evaluation tool for reward models.
☆687Jan 31, 2026Updated last week
microsoft / RLHF-APA
View on GitHub
RL algorithm: Advantage induced policy alignment
☆66Aug 11, 2023Updated 2 years ago
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,742Jan 8, 2024Updated 2 years ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,582Nov 24, 2025Updated 2 months ago
vwxyzjn / summarize_from_feedback_details
View on GitHub
☆160Nov 23, 2024Updated last year
opendilab / awesome-RLHF
View on GitHub
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,289Dec 9, 2025Updated 2 months ago
haoliuhl / chain-of-hindsight
View on GitHub
Simple next-token-prediction for RLHF
☆229Sep 30, 2023Updated 2 years ago
joeljang / RLPHF
View on GitHub
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆116Oct 23, 2023Updated 2 years ago
janphilippfranken / sami
View on GitHub
Self-Supervised Alignment with Mutual Information
☆20May 24, 2024Updated last year
GanjinZero / RRHF
View on GitHub
[NIPS2023] RRHF & Wombat
☆808Sep 22, 2023Updated 2 years ago
princeton-nlp / ALCE
View on GitHub
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
☆511Oct 9, 2024Updated last year
naver-ai / ALMoST
View on GitHub
☆24Dec 2, 2023Updated 2 years ago
nayeon7lee / FactualityPrompt
View on GitHub
☆89Nov 11, 2022Updated 3 years ago
agi-templar / Stable-Alignment
View on GitHub
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…
☆354Jun 18, 2023Updated 2 years ago
eric-mitchell / direct-preference-optimization
View on GitHub
Reference implementation for DPO (Direct Preference Optimization)
☆2,850Aug 11, 2024Updated last year
neelsjain / BYOD
View on GitHub
The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"
☆107Sep 23, 2023Updated 2 years ago
dunzeng / MORE
View on GitHub
Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
☆16Aug 6, 2024Updated last year
llava-rlhf / LLaVA-RLHF
View on GitHub
Aligning LMMs with Factually Augmented RLHF
☆392Nov 1, 2023Updated 2 years ago
alexrame / rewardedsoups
View on GitHub
Rewarded soups official implementation
☆62Sep 27, 2023Updated 2 years ago
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,814Jun 17, 2025Updated 7 months ago
JetRunner / SuperICL
View on GitHub
Code for "Small Models are Valuable Plug-ins for Large Language Models"
☆132May 16, 2023Updated 2 years ago
joeljang / FLM
View on GitHub
All-in-one repository for Fine-tuning & Pretraining (Large) Language Models
☆15Mar 8, 2023Updated 2 years ago
RLHFlow / Directional-Preference-Alignment
View on GitHub
Directional Preference Alignment
☆58Sep 23, 2024Updated last year
allenai / RL4LMs
View on GitHub
A modular RL library to fine-tune language models to human preferences
☆2,377Mar 1, 2024Updated last year
jhejna / cpl
View on GitHub
Code for Contrastive Preference Learning (CPL)
☆178Nov 22, 2024Updated last year
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆416Jun 25, 2025Updated 7 months ago
allenai / open-instruct
View on GitHub
AllenAI's post-training codebase
☆3,573Updated this week
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
☆8,989Feb 6, 2026Updated last week
IBM / Dromedary
View on GitHub
Dromedary: towards helpful, ethical and reliable LLMs.
☆1,143Sep 18, 2025Updated 4 months ago
RUCAIBox / HaluEval
View on GitHub
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆552Feb 12, 2024Updated 2 years ago
tatsu-lab / alpaca_farm
View on GitHub
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
☆842Jul 1, 2024Updated last year
Vance0124 / Token-level-Direct-Preference-Optimization
View on GitHub
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆151Feb 14, 2025Updated last year
openai / lm-human-preferences
View on GitHub
Code for the paper Fine-Tuning Language Models from Human Preferences
☆1,377Jul 25, 2023Updated 2 years ago

allenai / FineGrainedRLHFView external linksLinks

Alternatives and similar repositories for FineGrainedRLHF

allenai / FineGrainedRLHF
View external linksLinks