Code to enable layer-level steering in LLMs using sparse auto encoders
☆32Sep 18, 2025Updated 8 months ago
Alternatives and similar repositories for sae-steering
Users that are interested in sae-steering are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated 2 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- ☆10Dec 4, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2025] General-purpose activation steering library☆172Sep 18, 2025Updated 8 months ago
- Archive of questions from the Cambridge Mathematics Tripos☆10Jun 6, 2022Updated 4 years ago
- ☆35Jun 13, 2025Updated 11 months ago
- A curated list of resources for activation engineering☆139Oct 2, 2025Updated 8 months ago
- 【ICLR 2025 🔥】MMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.☆23Apr 19, 2025Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆35Mar 8, 2025Updated last year
- ☆10Nov 28, 2023Updated 2 years ago
- ☆11Oct 28, 2022Updated 3 years ago
- ☆14May 8, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆21Dec 14, 2024Updated last year
- A library for efficient patching and automatic circuit discovery.☆97Dec 31, 2025Updated 5 months ago
- ☆16Feb 24, 2022Updated 4 years ago
- Code release for the paper "Style Vectors for Steering Generative Large Language Models", accepted to the Findings of the EACL 2024.☆36Sep 26, 2024Updated last year
- ☆18May 19, 2026Updated 3 weeks ago
- Code Release for the 2023 NeurIPS Paper How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained langua…☆17Dec 6, 2024Updated last year
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆33Mar 2, 2025Updated last year
- The Happy Faces Benchmark☆15Jul 20, 2023Updated 2 years ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆68Aug 15, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- HSTGODE code☆11Nov 26, 2023Updated 2 years ago
- Sparse Autoencoder Training Library☆57May 1, 2025Updated last year
- helper functions for processing and integrating visual language information with Qwen-VL Series Model☆17Aug 30, 2024Updated last year
- ☆21Apr 3, 2026Updated 2 months ago
- ☆16Oct 23, 2023Updated 2 years ago
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.☆14May 17, 2024Updated 2 years ago
- Algebraic value editing in pretrained language models☆70Nov 1, 2023Updated 2 years ago
- A toolkit to induce interpretable workflows from raw computer-use activities.☆44Nov 13, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆14May 18, 2026Updated 3 weeks ago
- (Model-written) LLM evals library☆18Dec 13, 2024Updated last year
- ICLR 2024: Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations☆23May 1, 2025Updated last year
- 免费的AI视频生成nonebot插件,支持文生视频和 图文生视频☆10May 7, 2025Updated last year
- SDLC Copilot is an Agentic AI system designed to streamline and automate the Software Development Lifecycle (SDLC). From requirement gath…☆26Jun 14, 2025Updated 11 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆17Jan 8, 2025Updated last year
- 使用torch.distributed实现DP/TP/PP☆15Dec 28, 2023Updated 2 years ago