science-of-finetuning / sparsity-artifacts-crosscodersLinks
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆14Updated 2 weeks ago
Alternatives and similar repositories for sparsity-artifacts-crosscoders
Users that are interested in sparsity-artifacts-crosscoders are comparing it to the libraries listed below
Sorting:
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆28Updated last year
- ☆49Updated 9 months ago
- ☆23Updated 9 months ago
- Sparse Autoencoder Training Library☆55Updated 5 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆42Updated last year
- ☆16Updated last year
- ☆103Updated last year
- ☆108Updated 8 months ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆19Updated 11 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆83Updated 11 months ago
- A library for efficient patching and automatic circuit discovery.☆78Updated 3 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆68Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆33Updated last month
- Sparse and discrete interpretability tool for neural networks☆64Updated last year
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆61Updated 2 months ago
- ☆108Updated 2 years ago
- ☆12Updated last year
- ☆92Updated last year
- Universal Neurons in GPT2 Language Models☆30Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated 2 years ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated 2 years ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆29Updated last month
- Official code for the paper "Attention as a Hypernetwork"☆44Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆84Updated last year
- ☆45Updated last week
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated 2 years ago
- ☆17Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆56Updated last year
- ☆33Updated last year