aypan17/reward-misspecification

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aypan17/reward-misspecification)

aypan17 / reward-misspecification

☆10

Alternatives and similar repositories for reward-misspecification

Users that are interested in reward-misspecification are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yudasong / briee
View on GitHub
Representation Learning in RL
☆13Jun 1, 2022Updated 4 years ago
ronentk / dbca-splitter
View on GitHub
Independent implementation of DBCA method from http://arxiv.org/abs/1912.09713
☆11Nov 25, 2020Updated 5 years ago
yaodongyu / ProjNorm
View on GitHub
Predicting Out-of-Distribution Error with the Projection Norm
☆19Jul 27, 2022Updated 3 years ago
zhao-ht / ConvexCertify
View on GitHub
This is the code of our work CISS Certified Robustness Against Natural Language Attacks by Causal Intervention published on ICML 2022
☆11Dec 6, 2022Updated 3 years ago
lchen001 / HAPI
View on GitHub
☆16Nov 30, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
rddy / ReQueST
View on GitHub
Code for the paper, "Learning Human Objectives by Evaluating Hypothetical Behavior"
☆86Dec 13, 2019Updated 6 years ago
INK-USC / RiddleSense
View on GitHub
RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge
☆13Oct 20, 2021Updated 4 years ago
quantified-uncertainty / ai-safety-papers
View on GitHub
☆22Sep 9, 2021Updated 4 years ago
ben-eysenbach / info_geometry
View on GitHub
Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"
☆20Oct 6, 2021Updated 4 years ago
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
UnstoppableCurry / RWKV-LM-Interpretability-Research
View on GitHub
Interpretability analysis of language model outlier and attempts to distill the model
☆13May 8, 2023Updated 3 years ago
milesaturpin / cot-unfaithfulness
View on GitHub
☆57Oct 23, 2023Updated 2 years ago
gorogoroyasu / mnist-Grad-CAM
View on GitHub
implemented Grad-CAM https://arxiv.org/abs/1610.02391 for mnist datasets in Keras
☆12Nov 25, 2018Updated 7 years ago
RPC2 / PPO
View on GitHub
A concise PyTorch implementation of Proximal Policy Optimization(PPO) solving CartPole-v0
☆16Jun 11, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nishadsinghi / CleanCLIP
View on GitHub
Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023
☆40Oct 16, 2025Updated 9 months ago
yzhang511 / neural_decoding
View on GitHub
Official Implementation of RRR Decoder
☆14Mar 30, 2026Updated 3 months ago
siddk / lila
View on GitHub
Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.
☆13Nov 4, 2021Updated 4 years ago
1QB-Information-Technologies / NEM
View on GitHub
Neural Error Mitigation of Near-Term Quantum Simulations (arXiv:2105.08086)
☆10Jul 6, 2022Updated 4 years ago
kaixin96 / mixreg
View on GitHub
Code for our NeurIPS 2020 paper Improving Generalization in Reinforcement Learning with Mixture Regularization
☆34Oct 22, 2020Updated 5 years ago
zzzace2000 / robust_cls_model
View on GitHub
The code to reproduce CVPR 2021 paper "Towards Robust Classification Model by Counterfactual and Invariant Data Generation"
☆16Jul 29, 2021Updated 4 years ago
RPC2 / DQN_PyTorch
View on GitHub
PyTorch implementation of DQN
☆13Sep 27, 2019Updated 6 years ago
tinyfpga / TinyFPGA-SoC
View on GitHub
Opensource building blocks for TinyFPGA microcontrollers and retro computers.
☆18Sep 29, 2017Updated 8 years ago
redwoodresearch / interp
View on GitHub
Redwood Research's transformer interpretability tools
☆15Apr 15, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yawen-d / Neural-Network-on-MNIST-with-NumPy-from-Scratch
View on GitHub
Implement and train a neural network from scratch in Python for the MNIST dataset (no PyTorch).
☆14Mar 22, 2021Updated 5 years ago
alecGraves / DATA
View on GitHub
contains data to be used with machine learning
☆10Jul 28, 2017Updated 8 years ago
rdiaz02 / varSelRF
View on GitHub
☆12Jan 31, 2026Updated 5 months ago
ClimateMind / climatemind-frontend
View on GitHub
The Climate Mind team is building a web app to help stop climate change by empowering individuals to have better conversations about it a…
☆18May 6, 2025Updated last year
shiv213 / ImposterBot
View on GitHub
An open-source Discord bot to enhance your Among Us experience
☆12Mar 2, 2025Updated last year
LAION-AI / scaling-laws-for-comparison
View on GitHub
☆22May 12, 2026Updated 2 months ago
kevinyaobytedance / llm_eval
View on GitHub
LLM evaluation.
☆16Nov 7, 2023Updated 2 years ago
Farama-Foundation / CrowdPlay
View on GitHub
A web based platform for collecting human actions in reinforcement learning environments
☆31Sep 10, 2025Updated 10 months ago
IlyaLab / rf-ace
View on GitHub
(backup fork since google code is going down)
☆12Mar 12, 2015Updated 11 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ynx0 / airlock
View on GitHub
Communicate with an Urbit ship over the eyre protocol in Java
☆14Aug 12, 2021Updated 4 years ago
sarthakbagaria / has-sci
View on GitHub
A collection of computational methods in science.
☆12Jan 6, 2017Updated 9 years ago
kawu / nerf
View on GitHub
Named entity recognition tool based on linear-chain CRFs
☆16Dec 3, 2019Updated 6 years ago
HumanCompatibleAI / seals
View on GitHub
Benchmark environments for reward modelling and imitation learning algorithms.
☆47Sep 19, 2023Updated 2 years ago
Attila94 / CODaN
View on GitHub
Common Objects Day and Night image dataset.
☆15Nov 16, 2022Updated 3 years ago
lns / memoire
View on GitHub
☆18Apr 17, 2019Updated 7 years ago
likenneth / q_probe
View on GitHub
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆40Jun 10, 2024Updated 2 years ago