aryamanarora / causalgymView external linksLinks
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆51Nov 30, 2024Updated last year
Alternatives and similar repositories for causalgym
Users that are interested in causalgym are comparing it to the libraries listed below
Sorting:
- ☆20Jun 6, 2025Updated 8 months ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated last year
- 👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"☆20Jan 19, 2024Updated 2 years ago
- [Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring☆10Dec 1, 2023Updated 2 years ago
- Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"☆11Dec 14, 2022Updated 3 years ago
- ☆14May 24, 2022Updated 3 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆165Jun 25, 2025Updated 7 months ago
- AI-ready open dataset of e-commerce coupons, deals & redeem-links curated by Kindred☆17May 2, 2025Updated 9 months ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Debiasing Methods in Natural Language Understanding Make Bias More Accessible: Code and Data☆14Apr 24, 2022Updated 3 years ago
- A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.☆106Oct 4, 2023Updated 2 years ago
- Machine Learning for Source Code Analysis☆17Nov 20, 2023Updated 2 years ago
- Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆18Oct 7, 2025Updated 4 months ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Make the Best of Cross-lingual Transfer: Evidence from POS Tagging with over 100 Languages (ACL 2022)☆19May 17, 2022Updated 3 years ago
- ☆17Dec 21, 2023Updated 2 years ago
- Utility for behavioral and representational analyses of Language Models☆177Feb 9, 2026Updated last week
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆31Updated this week
- [ICLR 2025] ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains☆17Mar 4, 2025Updated 11 months ago
- graphpatch is a library for activation patching on PyTorch neural network models.☆20Feb 11, 2025Updated last year
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- Word sense disambiguation test sets for NMT☆20Dec 3, 2020Updated 5 years ago
- ☆207Oct 14, 2025Updated 4 months ago
- Landing page for MIB: A Mechanistic Interpretability Benchmark☆24Aug 15, 2025Updated 6 months ago
- ☆394Aug 21, 2025Updated 5 months ago
- Using sparse coding to find distributed representations used by neural networks.☆296Nov 10, 2023Updated 2 years ago
- Code for our ACL '23 paper titled "Grokking of Hierarchical Structure in Vanilla Transformers"☆24Oct 8, 2023Updated 2 years ago
- Sparse Autoencoder Training Library☆56May 1, 2025Updated 9 months ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆25Sep 10, 2024Updated last year
- ☆25Dec 20, 2023Updated 2 years ago
- ☆30Aug 2, 2024Updated last year
- [EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning☆21Jul 28, 2025Updated 6 months ago
- ☆29Dec 2, 2024Updated last year
- ☆34Feb 20, 2025Updated 11 months ago
- ☆70Oct 27, 2020Updated 5 years ago
- A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)☆26Sep 12, 2021Updated 4 years ago
- This repository contains all new resources that were created for the NAACL-2018 paper "Inducing a Lexicon of Abusive Words -- A Feature-B…☆29Mar 14, 2019Updated 6 years ago
- Combating hidden stratification with GEORGE☆64May 18, 2021Updated 4 years ago