dmbeaglehole / neural_controllersLinks
Code for steering and monitoring with concepts vectors in LLMs. https://arxiv.org/abs/2502.03708
☆12Updated this week
Alternatives and similar repositories for neural_controllers
Users that are interested in neural_controllers are comparing it to the libraries listed below
Sorting:
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆158Updated last month
- Efficient empirical NTKs in PyTorch☆22Updated 3 years ago
- A fast, effective data attribution method for neural networks in PyTorch☆215Updated 8 months ago
- ☆109Updated 3 weeks ago
- A centralized place for deep thinking code and experiments☆85Updated 2 years ago
- ☆103Updated 6 months ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆41Updated 6 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆52Updated 10 months ago
- ☆125Updated last year
- Model Zoos published at the NeurIPS 2022 Dataset & Benchmark track: "Model Zoos: A Dataset of Diverse Populations of Neural Network Model…☆55Updated 2 years ago
- ☆83Updated last year
- [ICLR 2025] General-purpose activation steering library☆88Updated 2 weeks ago
- Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]☆13Updated 4 months ago
- ☆184Updated last year
- ☆223Updated last year
- ☆20Updated 3 months ago
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆24Updated last year
- ☆71Updated 3 years ago
- AI Logging for Interpretability and Explainability🔬☆125Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆210Updated 7 months ago
- Code for the ICLR 2022 paper. Salient Imagenet: How to discover spurious features in deep learning?☆40Updated 2 years ago
- A library for efficient patching and automatic circuit discovery.☆74Updated 3 weeks ago
- ☆12Updated 2 years ago
- ☆234Updated 10 months ago
- Erasing concepts from neural representations with provable guarantees☆232Updated 6 months ago
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆103Updated 2 years ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆40Updated last year
- ☆235Updated last year
- ☆96Updated last year
- ☆70Updated 8 months ago