CausalGym: Benchmarking causal interpretability methods on linguistic tasks
☆51Nov 30, 2024Updated last year
Alternatives and similar repositories for causalgym
Users that are interested in causalgym are comparing it to the libraries listed below
Sorting:
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated last year
- ☆20Jun 6, 2025Updated 9 months ago
- Code and data for "A fine-grained comparison of pragmatic language understanding in humans and language models"☆11Dec 14, 2022Updated 3 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆171Updated this week
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- Code and data for "A Systematic Assessment of Syntactic Generalization in Neural Language Models"☆29Jun 18, 2021Updated 4 years ago
- ☆22Mar 31, 2022Updated 3 years ago
- Debiasing Methods in Natural Language Understanding Make Bias More Accessible: Code and Data☆14Apr 24, 2022Updated 3 years ago
- (NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations☆15Apr 14, 2025Updated 10 months ago
- Making a bridge between NLP models and Brain data☆19Jun 3, 2020Updated 5 years ago
- ☆32Feb 15, 2026Updated 3 weeks ago
- ☆15Apr 10, 2018Updated 7 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- [ICLR 2025] ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains☆16Mar 4, 2025Updated last year
- graphpatch is a library for activation patching on PyTorch neural network models.☆21Feb 11, 2025Updated last year
- Word sense disambiguation test sets for NMT☆20Dec 3, 2020Updated 5 years ago
- [FCS'24] LVLM Safety paper☆19Jan 4, 2025Updated last year
- ☆209Oct 14, 2025Updated 4 months ago
- ☆22May 7, 2025Updated 10 months ago
- ☆140Aug 4, 2024Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆297Nov 10, 2023Updated 2 years ago
- Code for our ACL '23 paper titled "Grokking of Hierarchical Structure in Vanilla Transformers"☆24Oct 8, 2023Updated 2 years ago
- Sparse Autoencoder Training Library☆55May 1, 2025Updated 10 months ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆26Sep 10, 2024Updated last year
- ☆30Aug 2, 2024Updated last year
- ☆25Dec 20, 2023Updated 2 years ago
- [EMNLP 2024] Ask-before-Plan: Proactive Language Agents for Real-World Planning☆21Jul 28, 2025Updated 7 months ago
- ☆33Updated this week
- ☆273Oct 1, 2024Updated last year
- ☆35Feb 20, 2025Updated last year
- ☆70Oct 27, 2020Updated 5 years ago
- A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)☆27Sep 12, 2021Updated 4 years ago
- SDLC Copilot is an Agentic AI system designed to streamline and automate the Software Development Lifecycle (SDLC). From requirement gath…☆23Jun 14, 2025Updated 8 months ago
- ☆12Nov 3, 2024Updated last year
- ☆13Oct 5, 2025Updated 5 months ago
- Examples of prompts that cause ChatGPT-4 to hallucinate.☆32Jul 22, 2023Updated 2 years ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- The evaluation pipeline for the 2024 BabyLM Challenge.☆33Nov 13, 2024Updated last year
- Official Code and Data repository of our ACL 2021 paper X-FACT: A New Benchmark Dataset for Multilingual Fact Checking.☆27Oct 4, 2024Updated last year