Prompts used in the Automated Auditing Blog Post
☆143Jul 24, 2025Updated 7 months ago
Alternatives and similar repositories for automated-auditing
Users that are interested in automated-auditing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 11 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- ☆572Jun 19, 2025Updated 9 months ago
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆27Jul 27, 2025Updated 7 months ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- Open-source audio embedding models, submitted to the HEAR 2021 challenge☆11Feb 15, 2026Updated last month
- ☆20Apr 10, 2025Updated 11 months ago
- Code repo for the model organisms and convergent directions of EM papers.☆56Sep 22, 2025Updated 6 months ago
- Make open-weight LLM agents play the game "Among Us", and study how the models learn and express lying and deception in the game.☆28Dec 17, 2025Updated 3 months ago
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Feb 20, 2025Updated last year
- ☆23Jun 22, 2025Updated 9 months ago
- ☆79Feb 18, 2026Updated last month
- Stochastic Parameter Decomposition☆68Updated this week
- ☆13Mar 7, 2022Updated 4 years ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆956Mar 12, 2026Updated last week
- ☆48May 27, 2025Updated 9 months ago
- Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.☆14May 17, 2024Updated last year
- ☆39Jul 4, 2025Updated 8 months ago
- ☆48Feb 13, 2026Updated last month
- ☆35Feb 20, 2025Updated last year
- Interpreting how transformers simulate agents performing RL tasks☆90Oct 23, 2023Updated 2 years ago
- Simple tool to identify and remediate the use of the AWS EC2 IMDSv1.☆15Aug 12, 2021Updated 4 years ago
- ppx_system is a syntax extension to known operating system at compile time☆12May 9, 2023Updated 2 years ago
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- ☆13Jul 12, 2024Updated last year
- ☆15Apr 26, 2025Updated 10 months ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆69Updated this week
- ☆134Oct 16, 2025Updated 5 months ago
- This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …☆134Feb 8, 2026Updated last month
- ☆20Jan 21, 2023Updated 3 years ago
- ☆24Feb 23, 2026Updated last month
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆863Updated this week
- ☆21Jul 21, 2025Updated 8 months ago
- utilities for batched llm calls with retries☆49Updated this week
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 5 months ago
- Ingestion pipeline for blr.today☆13Updated this week
- Semantic search over every Emergent Ventures winner.☆29Feb 26, 2026Updated 3 weeks ago
- ☆18Dec 10, 2025Updated 3 months ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Mar 11, 2025Updated last year