Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆31Apr 22, 2025Updated 10 months ago
Alternatives and similar repositories for arrakis
Users that are interested in arrakis are comparing it to the libraries listed below
Sorting:
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- A library for training crosscoders☆16May 28, 2025Updated 9 months ago
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- Tools for optimizing steering vectors in LLMs.☆20Apr 10, 2025Updated 11 months ago
- ☆17Jul 9, 2025Updated 8 months ago
- A benchmark for mechanistic discovery of circuits in Transformers☆16Dec 15, 2024Updated last year
- Engine for collecting, uploading, and downloading model activations☆26Apr 2, 2025Updated 11 months ago
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…☆20May 29, 2024Updated last year
- ☆20Apr 10, 2025Updated 10 months ago
- ☆25Feb 20, 2026Updated 2 weeks ago
- ☆20Feb 17, 2023Updated 3 years ago
- ☆58Nov 19, 2024Updated last year
- 🪄 Interpreto is an interpretability toolbox for LLMs☆147Updated this week
- An repository of 2025-2026 AI Safety and Alignment programs, camps, and workshops.☆21May 18, 2025Updated 9 months ago
- Unified access to Large Language Model modules using NNsight☆101Feb 28, 2026Updated last week
- Model zoo for different kinds of uncertainty quantification methods used in Natural Language Processing, implemented in PyTorch.☆55May 5, 2023Updated 2 years ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆66Updated this week
- ☆13Oct 5, 2025Updated 5 months ago
- A library for efficient patching and automatic circuit discovery.☆91Dec 31, 2025Updated 2 months ago
- Attribution-based Parameter Decomposition☆34Jun 11, 2025Updated 8 months ago
- ☆84Feb 25, 2025Updated last year
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- Mutual Fund Analysis Dashboard using Python, Excel, and Power BI | Top 30 Low-Risk High-Return Schemes Identified☆31Feb 2, 2026Updated last month
- An algorithm that intelligently executes a crypto order over time via Coinbase☆12Oct 26, 2021Updated 4 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- ☆52Oct 23, 2023Updated 2 years ago
- Open source interpretability artefacts for R1.☆172Apr 21, 2025Updated 10 months ago
- ☆14Apr 29, 2025Updated 10 months ago
- Yet another tool to search through your (exported) ChatGPT conversations☆13Dec 24, 2025Updated 2 months ago
- An efficient 40% keyboard layout☆11Dec 30, 2023Updated 2 years ago
- Geodesic updates for Nickel & Kiela's graph embedding algorithm in hyperbolic space.☆11Jun 6, 2018Updated 7 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segment…☆11Apr 29, 2024Updated last year
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- CFD case for simulation of RD107 rocket engine☆12Sep 17, 2025Updated 5 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- ppx_system is a syntax extension to known operating system at compile time☆12May 9, 2023Updated 2 years ago
- grafana, prometheus, alertmanager, node-exporter, cadvisor, alertmanager-bot for telegram in docker-compose and awesome grafana dashbord☆11Apr 19, 2023Updated 2 years ago
- Neural network sequence labeling model☆11Dec 28, 2019Updated 6 years ago