understanding-search / maze-dataset
maze datasets for investigating OOD behavior of ML systems
☆44Updated 2 weeks ago
Alternatives and similar repositories for maze-dataset
Users that are interested in maze-dataset are comparing it to the libraries listed below
Sorting:
- ☆83Updated 9 months ago
- ☆92Updated 3 months ago
- ☆92Updated 10 months ago
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆43Updated last year
- Universal Neurons in GPT2 Language Models☆29Updated 11 months ago
- Rewarded soups official implementation☆57Updated last year
- A library for efficient patching and automatic circuit discovery.☆64Updated 3 weeks ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆15Updated last month
- Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)☆36Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- ☆33Updated last year
- ☆40Updated last year
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 6 months ago
- ☆26Updated last year
- ☆14Updated last year
- Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-tur…☆15Updated 5 months ago
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆25Updated 3 months ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆28Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆155Updated 6 months ago
- ☆23Updated 7 months ago
- ☆64Updated 11 months ago
- ☆51Updated last month
- ☆94Updated last year
- ☆31Updated last year
- ☆22Updated 3 months ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆29Updated last year
- ☆85Updated last year
- ☆31Updated 4 months ago
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆48Updated 7 months ago
- Efficient empirical NTKs in PyTorch☆18Updated 2 years ago