vwxyzjn / summarize_from_feedback_detailsLinks

☆147

Alternatives and similar repositories for summarize_from_feedback_details

Users that are interested in summarize_from_feedback_details are comparing it to the libraries listed below

Sorting:

vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆187Updated last year
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆189Updated last year
architsharma97 / dpo-rlaif
☆99Updated last year
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆58Updated last year
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
Linear95 / SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
☆137Updated 5 months ago
jwhj / OREO
☆114Updated 6 months ago
thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆85Updated 2 years ago
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆185Updated 3 months ago
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆28Updated 8 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆59Updated 3 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆143Updated 5 months ago
swtheing / PF-PPO-RLHF
☆33Updated 10 months ago
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆162Updated 2 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆63Updated 9 months ago
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆30Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆190Updated last year
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
AlphaPav / mem-kk-logic
On Memorization of Large Language Models in Logical Reasoning
☆70Updated 4 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆107Updated last year
facebookresearch / iGSM
The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…
☆61Updated 6 months ago
yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
☆131Updated 5 months ago
joeljang / RLPHF
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆108Updated last year
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆59Updated 10 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆60Updated 6 months ago
lmarena / PPE
☆50Updated 2 months ago
facebookresearch / RLCD
Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment
☆69Updated last year