yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆29Updated 2 weeks ago
Alternatives and similar repositories for arrakis:
Users that are interested in arrakis are comparing it to the libraries listed below
- An introduction to LLM Sampling☆77Updated 4 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago
- we got you bro☆35Updated 9 months ago
- PyTorch implementation for MRL☆18Updated last year
- ☆130Updated last month
- ☆48Updated 6 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- Engine for collecting, uploading, and downloading model activations☆15Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 8 months ago
- PyTorch library for Active Fine-Tuning☆68Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆46Updated 3 weeks ago
- Prune transformer layers☆69Updated 11 months ago
- ☆26Updated last year
- ☆28Updated 5 months ago
- Sparse and discrete interpretability tool for neural networks☆61Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆42Updated 5 months ago
- Training code for Sparse Autoencoders on Embedding models☆38Updated 2 months ago
- Open source interpretability artefacts for R1.☆103Updated 2 weeks ago
- ☆73Updated last week
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 3 months ago
- 🧠 Starter templates for doing interpretability research☆70Updated last year
- gzip Predicts Data-dependent Scaling Laws☆34Updated 11 months ago
- Sparse Autoencoder Training Library☆49Updated this week
- Experiments with representation engineering☆11Updated last year
- ☆22Updated last year
- ☆47Updated 8 months ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆73Updated 5 months ago
- ☆38Updated 2 months ago