apple / ml-selfreflectLinks
☆40Updated 4 months ago
Alternatives and similar repositories for ml-selfreflect
Users that are interested in ml-selfreflect are comparing it to the libraries listed below
Sorting:
- ☆115Updated 11 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆42Updated 3 weeks ago
- ☆25Updated last year
- Universal Neurons in GPT2 Language Models☆30Updated last year
- ☆33Updated last year
- PyTorch library for Active Fine-Tuning☆96Updated 4 months ago
- 👋 Overcomplete is a Vision-based SAE Toolbox☆119Updated 2 months ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆32Updated 4 months ago
- Sparse Autoencoder Training Library☆56Updated 9 months ago
- ☆24Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆163Updated 7 months ago
- A library for efficient patching and automatic circuit discovery.☆88Updated last month
- ☆58Updated last year
- ☆20Updated 3 months ago
- Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet☆32Updated 2 years ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆19Updated last year
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆55Updated last year
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- Sparse and discrete interpretability tool for neural networks☆64Updated last year
- ☆19Updated 10 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆26Updated last year
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆32Updated last month
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks☆14Updated 9 months ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Updated this week
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆83Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆48Updated 4 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆63Updated last year
- Decomposing and Editing Predictions by Modeling Model Computation☆139Updated last year
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Updated 4 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆102Updated 3 months ago