Prompts used in the Automated Auditing Blog Post
☆150Jul 24, 2025Updated 9 months ago
Alternatives and similar repositories for automated-auditing
Users that are interested in automated-auditing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open Source Replication of Anthropic's Alignment Faking Paper☆56Apr 4, 2025Updated last year
- ☆596Jun 19, 2025Updated 10 months ago
- 🤖 Complete reproduction of 'AlphaGo Moment for Model Architecture Discovery' using MLX-LM instead of GPT-4. Autonomous neural architectu…☆29Jul 27, 2025Updated 9 months ago
- ☆20Apr 10, 2025Updated last year
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Feb 20, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Code repo for the model organisms and convergent directions of EM papers.☆62Sep 22, 2025Updated 7 months ago
- ☆81Feb 18, 2026Updated 2 months ago
- Stochastic Parameter Decomposition☆73Updated this week
- ☆130Feb 10, 2026Updated 2 months ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆1,004Updated this week
- ☆49May 27, 2025Updated 11 months ago
- Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.☆14May 17, 2024Updated last year
- ☆281Oct 1, 2024Updated last year
- ☆42Jul 4, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆48Mar 19, 2026Updated last month
- ☆35Feb 20, 2025Updated last year
- Interpreting how transformers simulate agents performing RL tasks☆90Oct 23, 2023Updated 2 years ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆189Apr 27, 2026Updated last week
- ☆20Oct 5, 2025Updated 7 months ago
- Simple tool to identify and remediate the use of the AWS EC2 IMDSv1.☆15Aug 12, 2021Updated 4 years ago
- A basic ls replacement, written in rust, using cursor ai and Geoffrey Huntley's techniques☆32Mar 3, 2025Updated last year
- ☆94Updated this week
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆19Oct 22, 2024Updated last year
- @ngrok/mantle ui component library | https://develop.mantle.ngrok.com☆13Updated this week
- ☆15Apr 26, 2025Updated last year
- CVE-2025-64155: Fortinet FortiSIEM Argument Injection to Remote Code Execution☆31Jan 13, 2026Updated 3 months ago
- Monte Carlo tree search for the travelling salesman problem (MCTS for the TSP)☆12Jun 18, 2022Updated 3 years ago
- ☆136Oct 16, 2025Updated 6 months ago
- ☆14Jul 12, 2024Updated last year
- ☆20Jan 21, 2023Updated 3 years ago
- ☆26Feb 23, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆73Apr 15, 2026Updated 2 weeks ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆917Updated this week
- Open-sourced evaluation suite from the Monitoring Monitorability paper☆69Apr 22, 2026Updated last week
- ☆23Aug 1, 2025Updated 9 months ago
- ☆16Feb 24, 2025Updated last year
- LLMs playing chess are sensitive to how the position came to be☆25Feb 14, 2024Updated 2 years ago
- Semantic search over every Emergent Ventures winner.☆30Apr 15, 2026Updated 2 weeks ago