yash-srivastava19 / arrakisLinks

Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.

☆31

Alternatives and similar repositories for arrakis

Users that are interested in arrakis are comparing it to the libraries listed below

Sorting:

ApolloResearch / apd
Attribution-based Parameter Decomposition
☆32Updated 5 months ago
tilde-research / activault
Engine for collecting, uploading, and downloading model activations
☆24Updated 7 months ago
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆75Updated 2 years ago
mega002 / llm-interp-tau
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
☆201Updated last week
google-deepmind / mishax
☆143Updated 2 months ago
Cohere-Labs-Community / AI-Alignment-Cohort
☆29Updated last year
ARBORproject / arborproject.github.io
☆83Updated 9 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆60Updated last year
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆228Updated 11 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 6 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆129Updated 9 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 7 months ago
nostalgebraist / transformer-utils
Utilities for the HuggingFace transformers library
☆72Updated 2 years ago
KihoPark / LLM_Categorical_Hierarchical_Representations
☆111Updated 9 months ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated 11 months ago
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆200Updated 6 months ago
Butanium / nnterp
Unified access to Large Language Model modules using NNsight
☆60Updated last week
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
annahdo / implementing_activation_steering
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆19Updated last year
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
callummcdougall / sae_visualizer
☆29Updated last year
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆226Updated last week
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆110Updated 3 weeks ago
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆95Updated 2 months ago
mishajw / repeng
Experiments with representation engineering
☆13Updated last year
srush / GPTWorld
A puzzle to learn about prompting
☆135Updated 2 years ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago