JuliaGrosse / ultsLinks

Uncertainty-guided Likelihood Tree Search

☆9

Alternatives and similar repositories for ults

Users that are interested in ults are comparing it to the libraries listed below

Sorting:

alexrame / rewardedsoups
Rewarded soups official implementation
☆58Updated last year
cassidylaidlaw / hidden-context
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
☆30Updated last year
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
linlu-qiu / lm-inductive-reasoning
☆34Updated last year
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆44Updated last year
micahcarroll / uniMASK
Codebase for "Uni[MASK]: Unified Inference in Sequential Decision Problems"
☆56Updated last year
gregorbachmann / Next-Token-Failures
☆90Updated last year
ahjwang / messenger-emma
Implements the Messenger environment and EMMA model.
☆25Updated 2 years ago
lfbo-ml / lfbo
Code for A General Recipe for Likelihood-free Bayesian Optimization, ICML 2022
☆44Updated 3 years ago
thjashin / rodeo
Gradient Estimation with Discrete Stein Operators (NeurIPS 2022)
☆17Updated last year
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆18Updated 8 months ago
erosenfeld / disagree_discrep
Provably (and non-vacuously) bounding test error of deep neural networks under distribution shift with unlabeled test data.
☆10Updated last year
vuoristo / MMAML-rl
☆15Updated 5 years ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
allenbai01 / transformers-as-statisticians
☆32Updated 2 years ago
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆45Updated 6 months ago
weigq / openview_quicklook
☆34Updated 5 months ago
holarissun / RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…
☆66Updated 4 months ago
zhxieml / PDT
Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer
☆28Updated 2 years ago
XanderJC / attention-based-credit
Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…
☆35Updated last year
google-deepmind / emergent_in_context_learning
☆84Updated last year
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆30Updated last year
IdoAmos / not-from-scratch
☆33Updated 9 months ago
rosieyzh / openrlhf-pretrain
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆20Updated 3 months ago
Jiacheng-Zhu-AIML / AsymmetryLoRA
Preprint: Asymmetry in Low-Rank Adapters of Foundation Models
☆35Updated last year
prajjwal1 / rl_paradigm
☆17Updated last year
tianjunz / TEMPERA
☆45Updated 2 years ago
cassidylaidlaw / orpo
☆18Updated 9 months ago
cmu-mind / RISE
☆32Updated 9 months ago
GFNOrg / gfn-lm-tuning
☆184Updated last year