Open-Social-World / autolibraLinks

AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback

☆16

Alternatives and similar repositories for autolibra

Users that are interested in autolibra are comparing it to the libraries listed below

Sorting:

princeton-pli / what-makes-good-rm
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆39Updated 2 months ago
formll / resolving-scaling-law-discrepancies
☆20Updated 2 weeks ago
aypan17 / reward-misspecification
☆10Updated 2 years ago
ZhentingWang / DUMP
☆32Updated 6 months ago
cassidylaidlaw / orpo
☆19Updated last year
srzer / MOD
Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".
☆27Updated last year
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆69Updated 7 months ago
Hritikbansal / jpo
☆13Updated 4 months ago
SIMONLQY / RethinkMCTS
☆30Updated last year
tsinghua-fib-lab / SmartAgent
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
☆27Updated 3 months ago
Zayne-sprague / To-CoT-or-not-to-CoT
☆25Updated 7 months ago
uservan / ThinkPO
☆17Updated 3 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 7 months ago
zzwkk / MUA-RL
MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE
☆44Updated 2 weeks ago
matchten / LoRA-Models-for-SAEs
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Updated 7 months ago
junkangwu / beta-DPO
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
☆49Updated last year
princeton-nlp / unintentional-unalignment
[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
☆31Updated 9 months ago
googleinterns / localizing-paragraph-memorization
☆15Updated last year
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
hkust-nlp / RL-Verifier-Robustness
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆23Updated last month
gautierdag / plancraft
Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs
☆21Updated last week
SalesforceAIResearch / UserBench
☆49Updated 3 months ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆62Updated 2 years ago
sotopia-lab / sotopia-rl
Sotopia-RL: Reward Design for Social Intelligence
☆43Updated 3 months ago
qcznlp / uncertainty_attack
☆21Updated 2 months ago
junkangwu / alpha-DPO
[ICML 2025] Official code of "AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization"
☆22Updated last year
KempnerInstitute / llm_uncertainty
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆10Updated last year
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆47Updated 6 months ago
junkangwu / Dr_DPO
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆18Updated last year
Junjie-Ye / MulDimIF
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
☆17Updated 5 months ago