understanding-search / maze-datasetLinks

maze datasets for investigating OOD behavior of ML systems

☆64

Alternatives and similar repositories for maze-dataset

Users that are interested in maze-dataset are comparing it to the libraries listed below

Sorting:

facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Updated last year
abdulhaim / LMRL-Gym
☆104Updated last year
KihoPark / linear_rep_geometry
☆108Updated 8 months ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆60Updated 2 years ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆179Updated 5 months ago
amazon-science / PAE
☆63Updated 8 months ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆176Updated 11 months ago
Timothyxxx / WorldModelPapers
Paper collections of the continuous effort start from World Models.
☆188Updated last year
ZhaolinGao / REBEL
Reinforcement Learning via Regressing Relative Rewards
☆36Updated 10 months ago
shiqiangw / iclr-scores
☆54Updated 11 months ago
yilundu / ired_code_release
☆73Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆79Updated 3 months ago
ryoungj / BoLT
Code for "Reasoning to Learn from Latent Thoughts"
☆122Updated 7 months ago
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆205Updated 2 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆182Updated 6 months ago
aypan17 / machiavelli
☆139Updated 3 months ago
GFNOrg / gfn-lm-tuning
☆185Updated last year
yossigandelsman / second_order_lens
Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"
☆40Updated 11 months ago
activatedgeek / calibration-tuning
☆52Updated 6 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆125Updated last year
vedantpalit / Towards-Vision-Language-Mechanistic-Interpretability
This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…
☆23Updated last year
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆118Updated last year
szxiangjn / world-model-for-language-model
☆131Updated last year
princeton-pli / what-makes-good-rm
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆39Updated last month
google-deepmind / emergent_in_context_learning
☆85Updated last year
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆91Updated 4 months ago
rosieyzh / openrlhf-pretrain
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆24Updated 3 weeks ago
snu-mllab / DPPO
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
☆42Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 6 months ago
weigq / openview_quicklook
☆34Updated 7 months ago