RLHFlow / RLHFlow.github.io

Webpage for RLHFlow

☆9

Alternatives and similar repositories for RLHFlow.github.io:

Users that are interested in RLHFlow.github.io are comparing it to the libraries listed below

RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆56Updated 5 months ago
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆23Updated 5 months ago
thu-wyz / inference_scaling
☆55Updated 3 months ago
sail-sg / Attention-Sink
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆49Updated 4 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆47Updated 4 months ago
facebookresearch / iGSM
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…
☆36Updated last month
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆42Updated 6 months ago
yinyueqin / relative-preference-optimization
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
☆21Updated 11 months ago
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆66Updated 6 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆31Updated 7 months ago
SalesforceAIResearch / ThinK
ThinK: Thinner Key Cache by Query-Driven Pruning
☆15Updated last week
gregorbachmann / Next-Token-Failures
☆80Updated 11 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆127Updated last week
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆121Updated 3 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆48Updated 2 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆50Updated 8 months ago
pkunlp-icler / PCA-EVAL
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆102Updated 11 months ago
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆20Updated 2 months ago
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆19Updated last year
wzhouad / WPO
Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"
☆37Updated 4 months ago
GAIR-NLP / self-improvement-reversal
☆13Updated 7 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆123Updated last month
swtheing / PF-PPO-RLHF
☆30Updated 5 months ago
ChangyuChen347 / MaskedThought
[ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
☆16Updated 7 months ago
Linear95 / DSP
Domain-specific preference (DSP) data and customized RM fine-tuning.
☆24Updated 11 months ago
rookie-joe / AutoPSV
☆41Updated 3 months ago
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆29Updated 8 months ago