safety-research/petri

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/safety-research/petri)

safety-research / petri

An alignment auditing agent capable of quickly exploring alignment hypothesis

☆982

Alternatives and similar repositories for petri

Users that are interested in petri are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

safety-research / safety-tooling
View on GitHub
Inference API for many LLMs and other useful tools for empirical research
☆113Mar 23, 2026Updated 2 weeks ago
anthropic-experimental / agentic-misalignment
View on GitHub
☆588Jun 19, 2025Updated 9 months ago
safety-research / false-facts
View on GitHub
☆40Jul 4, 2025Updated 9 months ago
UKGovernmentBEIS / inspect_ai
View on GitHub
Inspect: A framework for large language model evaluations
☆1,890Updated this week
anthropic-experimental / automated-auditing
View on GitHub
Prompts used in the Automated Auditing Blog Post
☆148Jul 24, 2025Updated 8 months ago
DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆25Updated this week
TransluceAI / introspective-interp
View on GitHub
Repository for "Training Language Models To Explain Their Own Computations"
☆21Dec 22, 2025Updated 3 months ago
safety-research / open-source-alignment-faking
View on GitHub
Open Source Replication of Anthropic's Alignment Faking Paper
☆56Apr 4, 2025Updated last year
METR / vivaria
View on GitHub
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆135Feb 15, 2026Updated last month
safety-research / SHADE-Arena
View on GitHub
☆24Jun 22, 2025Updated 9 months ago
METR / eval-analysis-public
View on GitHub
Public repository containing METR's DVC pipeline for eval data analysis
☆252Mar 6, 2026Updated last month
rgreenblatt / control-evaluations
View on GitHub
☆23May 25, 2024Updated last year
jkutaso / SHADE-Arena
View on GitHub
☆43May 9, 2025Updated 11 months ago
safety-research / safety-examples
View on GitHub
☆25Nov 11, 2025Updated 5 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
oclivegriffin / crosscode
View on GitHub
A library for training crosscoders
☆16May 28, 2025Updated 10 months ago
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆71Updated this week
alexander-turner / attainable-utility-preservation
View on GitHub
☆11Jun 2, 2021Updated 4 years ago
Jiaxin-Wen / MisleadLM
View on GitHub
Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""
☆19Oct 11, 2024Updated last year
Jiaxin-Wen / Unsupervised-Elicitation
View on GitHub
☆41Jul 6, 2025Updated 9 months ago
UKGovernmentBEIS / inspect_evals
View on GitHub
Collection of evals for Inspect AI
☆424Updated this week
ajobi-uhc / seer
View on GitHub
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …
☆137Feb 8, 2026Updated 2 months ago
HyperPotatoNeo / RSA
View on GitHub
☆147Sep 29, 2025Updated 6 months ago
AI-ANK / c3-python-nostream
View on GitHub
Python Server for C3 AI app. A project that brings the power of Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) with…
☆24Jan 7, 2024Updated 2 years ago
NordVPN Special Discount Offer • Ad
Save on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
UKGovernmentBEIS / hibayes
View on GitHub
☆48Mar 19, 2026Updated 3 weeks ago
Cadenza-Labs / sleeper-agents
View on GitHub
☆13Jul 12, 2024Updated last year
jacobdunefsky / llm-steering-opt
View on GitHub
Tools for optimizing steering vectors in LLMs.
☆21Apr 10, 2025Updated last year
harish-kamath / rqae
View on GitHub
Residual Quantization Autoencoder, used for interpreting LLMs
☆14Jan 1, 2025Updated last year
curt-tigges / crosslayer-coding
View on GitHub
☆17Jul 9, 2025Updated 9 months ago
TransluceAI / jailbreaking-frontier-models
View on GitHub
☆25Sep 3, 2025Updated 7 months ago
shadsidd / continuous-security-assessment-tool
View on GitHub
A Python-based security assessment tool for continuous automated security scanning and monitoring of domains.
☆13Apr 4, 2025Updated last year
Sakil786 / llama4_trip_planning_agent
View on GitHub
llama4_trip_planning_agent
☆12Apr 5, 2025Updated last year
callummcdougall / ARENA_3.0
View on GitHub
☆1,031Mar 29, 2026Updated 2 weeks ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ActiveInferenceInstitute / GeneralizedNotationNotation
View on GitHub
☆23Updated this week
EleutherAI / sparsify
View on GitHub
Sparsify transformers with SAEs and transcoders
☆704Updated this week
vineethsai / vulnerablemcp
View on GitHub
A comprehensive database of Model Context Protocol vulnerabilities, security research, and exploits
☆36Feb 16, 2026Updated last month
collinear-ai / spider
View on GitHub
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆98Feb 26, 2026Updated last month
UKGovernmentBEIS / inspect_k8s_sandbox
View on GitHub
A Kubernetes sandbox environment for use with inspect_ai
☆29Updated this week
JacobPfau / procgenAISC
View on GitHub
☆20Jan 21, 2023Updated 3 years ago
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆260Sep 24, 2024Updated last year