science-of-finetuning / sparsity-artifacts-crosscodersLinks
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆11Updated 2 weeks ago
Alternatives and similar repositories for sparsity-artifacts-crosscoders
Users that are interested in sparsity-artifacts-crosscoders are comparing it to the libraries listed below
Sorting:
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆27Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Updated last year
- Sparse Autoencoder Training Library☆53Updated 2 months ago
- ☆23Updated 5 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆75Updated 8 months ago
- Self-Supervised Alignment with Mutual Information☆20Updated last year
- ☆45Updated last year
- The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".☆33Updated last month
- ☆87Updated last year
- ☆14Updated last year
- ☆20Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- [Preprint] AdaVAE: Exploring Adaptive GPT-2s in VAEs for Language Modeling PyTorch Implementation☆35Updated last year
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆16Updated 2 years ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- ☆32Updated 8 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated 2 years ago
- ☆26Updated 2 years ago
- Data for "Datamodels: Predicting Predictions with Training Data"☆97Updated 2 years ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- ☆27Updated 5 months ago
- Official code for the paper: "Metadata Archaeology"☆19Updated 2 years ago
- ☆20Updated last year
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- A library for efficient patching and automatic circuit discovery.☆70Updated 2 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 5 months ago
- ☆100Updated 5 months ago
- This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".☆23Updated 3 months ago
- ☆20Updated 2 years ago
- ☆84Updated 11 months ago