aypan17 / reward-misspecificationView external linksLinks
☆11Mar 13, 2023Updated 2 years ago
Alternatives and similar repositories for reward-misspecification
Users that are interested in reward-misspecification are comparing it to the libraries listed below
Sorting:
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback☆17Oct 15, 2025Updated 4 months ago
- This is the code of our work CISS Certified Robustness Against Natural Language Attacks by Causal Intervention published on ICML 2022☆11Dec 6, 2022Updated 3 years ago
- RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge☆14Oct 20, 2021Updated 4 years ago
- ☆18Mar 19, 2025Updated 10 months ago
- ☆20Mar 3, 2025Updated 11 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Code & Experiments for "LILA: Language-Informed Latent Actions" to be presented at the Conference on Robot Learning (CoRL) 2021.☆14Nov 4, 2021Updated 4 years ago
- Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023☆39Oct 16, 2025Updated 3 months ago
- ☆17Nov 30, 2022Updated 3 years ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆19May 24, 2025Updated 8 months ago
- Code for our NeurIPS 2020 paper Improving Generalization in Reinforcement Learning with Mixture Regularization☆35Oct 22, 2020Updated 5 years ago
- The code to reproduce CVPR 2021 paper "Towards Robust Classification Model by Counterfactual and Invariant Data Generation"☆17Jul 29, 2021Updated 4 years ago
- Code to accompany the paper "The Information Geometry of Unsupervised Reinforcement Learning"☆20Oct 6, 2021Updated 4 years ago
- ☆20Nov 4, 2025Updated 3 months ago
- Code for the paper, "Learning Human Objectives by Evaluating Hypothetical Behavior"☆84Dec 13, 2019Updated 6 years ago
- ☆19Jan 21, 2023Updated 3 years ago
- ☆21Dec 17, 2020Updated 5 years ago
- ☆18Apr 17, 2019Updated 6 years ago
- Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…☆22Jun 3, 2024Updated last year
- ☆22Sep 9, 2021Updated 4 years ago
- Reproducible Language Agent Research☆33Jun 25, 2025Updated 7 months ago
- Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.☆25Sep 26, 2020Updated 5 years ago
- This repository provides a PyTorch implementation of "Fooling Neural Network Interpretations via Adversarial Model Manipulation". Our pap…☆23Dec 19, 2020Updated 5 years ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models. (ICML 2025)☆35Apr 2, 2025Updated 10 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Feb 8, 2026Updated last week
- DataSciBench: An LLM Agent Benchmark for Data Science☆50Jan 21, 2026Updated 3 weeks ago
- The Arcade Learning Environment (ALE) -- a platform for AI research.☆24Sep 18, 2024Updated last year
- Official Code Release for "Training a Generally Curious Agent"☆44May 18, 2025Updated 8 months ago
- Modelling epidemiological dynamics and performing inference in these models☆27Jul 30, 2021Updated 4 years ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Feb 25, 2025Updated 11 months ago
- A web based platform for collecting human actions in reinforcement learning environments☆31Sep 10, 2025Updated 5 months ago
- ☆35Jul 5, 2023Updated 2 years ago
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- Predictable MDP Abstraction for Unsupervised Model-Based RL (ICML 2023)☆32Feb 6, 2023Updated 3 years ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆33Dec 14, 2023Updated 2 years ago