ayyucekizrak / Mechanistic-InterpretabilityLinks
Mechanistic Interpretability in Transformers: This repository explores advanced techniques like Induction Head Detection and QK Circuit Analysis to uncover the inner workings of transformer-based models.
☆28Updated last year
Alternatives and similar repositories for Mechanistic-Interpretability
Users that are interested in Mechanistic-Interpretability are comparing it to the libraries listed below
Sorting:
- A curated list of Turkish AI models, datasets, papers☆45Updated last month
- ☆202Updated 10 months ago
- ☆54Updated 11 months ago
- Dilbilim kurallarını temel alarak çok dilli metinleri işlemek ve anlam bütünlüğünü korumak için gelişmiş bir tokenizer altyapısı geliştir…☆15Updated 2 months ago
- ☆45Updated 4 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆216Updated last week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆147Updated 2 weeks ago
- ☆348Updated last month
- ☆81Updated 7 months ago
- This repository collects all relevant resources about interpretability in LLMs☆374Updated 11 months ago
- ☆175Updated 11 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated 11 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆330Updated 11 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆258Updated 2 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆221Updated 10 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆136Updated 3 months ago
- Open source interpretability artefacts for R1.☆161Updated 5 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆86Updated last month
- https://transformer-circuits.pub/2025/attribution-graphs/methods.html☆86Updated 6 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆26Updated 8 months ago
- ☆73Updated last week
- Decoder only transformer, built from scratch with PyTorch☆31Updated last year
- Sparsify transformers with SAEs and transcoders☆640Updated this week
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆311Updated 2 months ago
- Mechanistic Interpretability Visualizations using React☆291Updated 10 months ago
- ☆20Updated 6 months ago
- Sparse Autoencoder Training Library☆55Updated 5 months ago
- ☆244Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆274Updated last year
- Modified to support crosscoder training.☆23Updated last week