cassidylaidlaw/hidden-context

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cassidylaidlaw/hidden-context)

cassidylaidlaw / hidden-context

Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"

☆35

Alternatives and similar repositories for hidden-context

Users that are interested in hidden-context are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dsbrown1331 / bayesianrex
View on GitHub
☆21Dec 17, 2020Updated 5 years ago
Cranial-XIX / BOME
View on GitHub
☆16Apr 12, 2023Updated 3 years ago
Zhou-Zoey / RMB-Reward-Model-Benchmark
View on GitHub
☆48Mar 25, 2025Updated last year
HumanCompatibleAI / learning-from-human-preferences
View on GitHub
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
☆31Jul 27, 2021Updated 4 years ago
vivekmyers / contrastive_metrics
View on GitHub
Code for the paper "Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making"
☆29Jul 11, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
alexrame / rewardedsoups
View on GitHub
Rewarded soups official implementation
☆64Sep 27, 2023Updated 2 years ago
luchris429 / model-free-opponent-shaping
View on GitHub
Code for Model-Free Opponent Shaping (ICML 2022)
☆24Nov 18, 2022Updated 3 years ago
voot-t / vild_code
View on GitHub
Source code of "Variational Imitation Learning with Diverse-quality Demonstrations" in ICML 2020. This github repository includes python …
☆20Aug 16, 2021Updated 4 years ago
koayon / phil-interp-papers
View on GitHub
A curated reading list for researchers in the Philosophy of Interpretability
☆17Aug 17, 2025Updated 11 months ago
vwxyzjn / summarize_from_feedback_details
View on GitHub
☆164Nov 23, 2024Updated last year
NeuralMMO / baselines
View on GitHub
Baselines for Neural MMO -- new users should treat this repo as a starter project
☆52Jul 29, 2024Updated last year
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
aditimavalankar / option-keyboard
View on GitHub
PyTorch implementation of "The Option Keyboard: Combining Skills in Reinforcement Learning" (NeurIPS 2019)
☆12Jul 2, 2020Updated 6 years ago
facebookresearch / GAN-optimization-landscape
View on GitHub
code to reproduce the empirical results in the research paper
☆40Oct 12, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
dunzeng / MORE
View on GitHub
Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment
☆16Aug 6, 2024Updated last year
cassidylaidlaw / effective-horizon
View on GitHub
Code and data for the paper "Bridging RL Theory and Practice with the Effective Horizon"
☆50Jun 26, 2024Updated 2 years ago
Will-Nie / AutoLinePlotter
View on GitHub
This repo support auto line plot for multi-seed event file from TensorBoard
☆12Jun 23, 2022Updated 4 years ago
kuanhenglin / ddim-inversion
View on GitHub
UCLA CS 188 (Winter 2023) course project.
☆12Mar 31, 2023Updated 3 years ago
s-ball-10 / jailbreak_dynamics
View on GitHub
☆25Jun 13, 2024Updated 2 years ago
Stilwell-Git / Randomized-Return-Decomposition
View on GitHub
TensorFlow implementation for our paper "Learning Long-Term Reward Redistribution via Randomized Return Decomposition"
☆19Mar 17, 2022Updated 4 years ago
exporl / vlaai
View on GitHub
Decoding of the speech envelope from EEG using the VLAAI deep neural network
☆14Sep 28, 2022Updated 3 years ago
brain-research / mirage-rl
View on GitHub
Code to reproduce the experiments in The Mirage of Action-Dependent Baselines in Reinforcement Learning.
☆17Aug 2, 2018Updated 7 years ago
kvsnoufal / reinforce
View on GitHub
Implementing REINFORCE algorithm on Pong, Lunar Lander and Cartplot + Medium Article
☆23Nov 24, 2020Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
ZHZisZZ / emulated-disalignment
View on GitHub
[ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
☆39Aug 2, 2024Updated last year
Wizardcoast / Linear_Alignment
View on GitHub
This repo is reproduction resources for linear alignment paper, still working
☆17May 19, 2024Updated 2 years ago
ReedZyd / GenerativeReturnDecomposition
View on GitHub
Source code for Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach (NeurIPS 2023)
☆10Dec 12, 2023Updated 2 years ago
OpenLLMAI / OpenLLMDE
View on GitHub
OpenLLMDE: An open source data engineering framework for LLMs
☆18Sep 9, 2023Updated 2 years ago
IBM / kstar
View on GitHub
K* search based implementation of top-k and top-quality planners
☆19Apr 1, 2026Updated 3 months ago
StanfordASL / RSIRL
View on GitHub
Risk-sensitive Inverse Reinforcement Learning
☆11Sep 11, 2019Updated 6 years ago
bethgelab / delta-belief-rl
View on GitHub
Official implementation of the ΔBelief-RL method.
☆31Feb 28, 2026Updated 4 months ago
SafeRL-Lab / Robust-RL-Baselines
View on GitHub
Robust Reinforcement Learning Benchmark
☆13Sep 22, 2024Updated last year
JoshuaDavid / Neighbor_Joining
View on GitHub
Python neighbor-joining library. Goal: Efficient O(n^2) neighbor-joining algorithm.
☆12May 5, 2014Updated 12 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
taodav / nsrs
View on GitHub
Code for the paper Novelty Search in Representational Space for Sample Efficient Exploration presented at NeurIPS 2020.
☆14Jul 16, 2024Updated 2 years ago
rainavyas / attack-comparative-assessment
View on GitHub
Adversaial attack comparative assessment Large Language Model
☆13May 21, 2025Updated last year
Jasonxu1225 / Awesome-Constraint-Inference-in-RL
View on GitHub
[TMLR 2025] A collection of research papers on constraint inference within the field of RL
☆11May 9, 2025Updated last year
tsachiblau / Threat-Model-Agnostic-Adversarial-Defense-using-Diffusion-Models
View on GitHub
☆12Jul 19, 2022Updated 4 years ago
AI-secure / multi-task-learning
View on GitHub
Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxi…
☆67Oct 18, 2021Updated 4 years ago
autodistill / autodistill-dinov2
View on GitHub
DINOv2 module for use with Autodistill.
☆16Dec 6, 2023Updated 2 years ago
staghuntrpg / RPG
View on GitHub
This is the source code of RPG (Reward-Randomized Policy Gradient)
☆42Sep 1, 2022Updated 3 years ago