bilal-chughtai / rep-theory-mech-interpView external linksLinks
☆28May 4, 2023Updated 2 years ago
Alternatives and similar repositories for rep-theory-mech-interp
Users that are interested in rep-theory-mech-interp are comparing it to the libraries listed below
Sorting:
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆31Jan 31, 2023Updated 3 years ago
- ☆12Updated this week
- Code for "Automatic Circuit Finding and Faithfulness"☆16Jul 11, 2024Updated last year
- Reimplementation of https://github.com/montemac/algebraic_value_editing in pure PyTorch for efficiency on large models☆11Jun 28, 2023Updated 2 years ago
- ☆66Feb 16, 2023Updated 3 years ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Dec 14, 2024Updated last year
- computational physics (Chungbuk National University, Korea)☆10May 26, 2022Updated 3 years ago
- ☆17Updated this week
- gpt completions in vscode☆35Mar 24, 2023Updated 2 years ago
- ☆267Oct 1, 2024Updated last year
- Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "feat…☆42Jul 14, 2025Updated 7 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- ☆29Jul 17, 2023Updated 2 years ago
- Tools for studying developmental interpretability in neural networks.☆126Dec 30, 2025Updated last month
- Official repo for the paper "Bilinear MLPs enable weight-based mechanistic interpretability".☆28Aug 2, 2025Updated 6 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Sep 21, 2023Updated 2 years ago
- ☆27Oct 6, 2024Updated last year
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Apr 18, 2024Updated last year
- ☆29Oct 18, 2022Updated 3 years ago
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆32Jul 22, 2024Updated last year
- Visual Transformer Mechanistic Analysis Tool☆35Jun 3, 2023Updated 2 years ago
- Sparse probing paper full code.☆66Dec 17, 2023Updated 2 years ago
- ☆207Oct 14, 2025Updated 4 months ago
- Code associated to papers on superposition (in ML interpretability)☆35Sep 13, 2022Updated 3 years ago
- Mechanistic Interpretability Visualizations using React☆320Dec 18, 2024Updated last year
- Lecture notes and programming exercises carried out as part of the Computational Physics 1 course taught at Yachay Tech University.☆23Updated this week
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Nov 4, 2024Updated last year
- A library for efficient patching and automatic circuit discovery.☆90Dec 31, 2025Updated last month
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- ☆34Feb 29, 2024Updated last year
- ☆31Mar 27, 2025Updated 10 months ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆82Apr 11, 2024Updated last year
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 10 months ago
- Situational Awareness Dataset☆43Dec 14, 2024Updated last year
- “Replace your politicians with code.” — Home of the Popularis Direct Democracy Whitepaper.☆11Oct 31, 2022Updated 3 years ago
- A Data-Driven Approach to Predict the Success of Bank Telemarketing☆10Apr 27, 2021Updated 4 years ago
- Primus-SaFE(Stability and Fault Endurance)☆50Updated this week
- Material for my course of Computational Physics (3rd semester, obligatory), in National and Kapodistrian University of Athens☆15Jan 9, 2026Updated last month
- ☆36Jul 14, 2022Updated 3 years ago