☆30Aug 2, 2024Updated last year
Alternatives and similar repositories for steering-llama3
Users that are interested in steering-llama3 are comparing it to the libraries listed below
Sorting:
- Steering vectors for transformer language models in Pytorch / Huggingface☆139Feb 21, 2025Updated last year
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- Sparse Autoencoder Training Library☆55May 1, 2025Updated 10 months ago
- [ICLR 2025] General-purpose activation steering library☆144Sep 18, 2025Updated 5 months ago
- A Chrome extension that allows you to export your Claude.ai conversations in various formats (JSON, Markdown, Plain Text) with support fo…☆34Oct 27, 2025Updated 4 months ago
- ☆19Mar 5, 2024Updated 2 years ago
- A library for mechanistic anomaly detection☆22Jan 9, 2025Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆247Updated this week
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- A library for making RepE control vectors☆691Sep 24, 2025Updated 5 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- Algebraic value editing in pretrained language models☆69Nov 1, 2023Updated 2 years ago
- ☆209Oct 14, 2025Updated 4 months ago
- ☆153Dec 30, 2025Updated 2 months ago
- A resource repository for representation engineering in large language models☆148Nov 14, 2024Updated last year
- ☆36Jul 14, 2022Updated 3 years ago
- ☆11Jun 20, 2023Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- Erasing concepts from neural representations with provable guarantees☆243Jan 27, 2025Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Mar 7, 2025Updated 11 months ago
- ☆36Apr 30, 2024Updated last year
- Sparsify transformers with SAEs and transcoders☆699Updated this week
- PyTorch library to accelerate super-resolution research☆11Jun 23, 2024Updated last year
- [ICCV 2023] "TRM-UAP: Enhancing the Transferability of Data-Free Universal Adversarial Perturbation via Truncated Ratio Maximization", Yi…☆13Jul 17, 2024Updated last year
- COMMS Software for UPSat☆12Dec 17, 2018Updated 7 years ago
- ☆12Jun 26, 2024Updated last year
- ☆10Oct 11, 2022Updated 3 years ago
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- ☆100Aug 8, 2024Updated last year
- Lime is an active hook manager which allows fillers or market makers to set price and fill Intent / RFQ based swap requests.☆11Sep 24, 2023Updated 2 years ago
- Utils to view, curate, pseudonymize, and anonymize DICOM tags and to copy DICOM files.☆11Oct 15, 2025Updated 4 months ago
- Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)☆16Sep 28, 2024Updated last year
- [CVPR'25] AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data☆17Mar 27, 2025Updated 11 months ago
- Dynamic sparse bounded octree in Go☆11Dec 7, 2020Updated 5 years ago
- Simple rules based grapheme to phoneme in Python☆11Sep 2, 2017Updated 8 years ago
- ☆15Aug 19, 2025Updated 6 months ago