☆59Jan 17, 2025Updated last year
Alternatives and similar repositories for matryoshka_sae
Users that are interested in matryoshka_sae are comparing it to the libraries listed below
Sorting:
- ☆27Nov 28, 2024Updated last year
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- Implementation of the BatchTopK activation function for training sparse autoencoders (SAEs)☆61Jul 24, 2025Updated 7 months ago
- Trains Sparse Autoencoders based on outputs from language models☆11Oct 7, 2024Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆244Updated this week
- One-Shot Unsupervised Cross Domain Detection☆13Nov 22, 2022Updated 3 years ago
- Sparse Autoencoder Training Library☆55May 1, 2025Updated 10 months ago
- ☆15Nov 11, 2023Updated 2 years ago
- Sparsify transformers with SAEs and transcoders☆699Updated this week
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆183Sep 26, 2025Updated 5 months ago
- Engine for collecting, uploading, and downloading model activations☆26Apr 2, 2025Updated 11 months ago
- Group-conditional DRO to alleviate spurious correlations☆15Jul 15, 2021Updated 4 years ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Dec 14, 2024Updated last year
- Training Transformers with knowledge localization (SGTM)☆48Jan 11, 2026Updated last month
- [SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk☆16Mar 15, 2025Updated 11 months ago
- Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation"☆30Oct 31, 2025Updated 4 months ago
- Applying SAEs for fine-grained control☆25Dec 15, 2024Updated last year
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆25Dec 1, 2024Updated last year
- [ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…☆24Jun 1, 2024Updated last year
- ☆24Aug 23, 2025Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Feb 27, 2026Updated last week
- ☆105Oct 30, 2023Updated 2 years ago
- ☆23Jun 15, 2022Updated 3 years ago
- Code repo for the model organisms and convergent directions of EM papers.☆53Sep 22, 2025Updated 5 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- ☆576Jul 19, 2024Updated last year
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- ☆27Feb 15, 2025Updated last year
- Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)☆36Dec 17, 2024Updated last year
- Segment This Thing is an efficient image segmentation models that uses a biologically-inspired foveated tokenization to reduce inference …☆55Jun 16, 2025Updated 8 months ago
- ☆28Feb 27, 2025Updated last year
- Bidirectional Mapping between Action Physical-Semantic Space☆34Sep 7, 2025Updated 5 months ago
- Bogazici University - CMPE150 (Introduction to Computing C) lab notes☆11Dec 20, 2019Updated 6 years ago
- Implementation of NIPS2023: Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieva☆11Nov 12, 2024Updated last year
- Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off☆33Apr 28, 2022Updated 3 years ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆84Nov 27, 2024Updated last year
- NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆52Feb 3, 2026Updated last month
- [NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆64Nov 27, 2025Updated 3 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆38May 26, 2025Updated 9 months ago