TransluceAI/jailbreaking-frontier-models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TransluceAI/jailbreaking-frontier-models)

TransluceAI / jailbreaking-frontier-models

☆28

Alternatives and similar repositories for jailbreaking-frontier-models

Users that are interested in jailbreaking-frontier-models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / llmail-inject-challenge
View on GitHub
Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge
☆25Apr 9, 2026Updated 3 months ago
scaleapi / mrt
View on GitHub
https://scale.com/research/mrt
☆20Mar 16, 2026Updated 4 months ago
aisa-group / promptinject-agent-skills
View on GitHub
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
☆21Jul 2, 2026Updated 2 weeks ago
safety-research / false-facts
View on GitHub
☆50Jul 4, 2025Updated last year
curt-tigges / probity
View on GitHub
☆19Apr 10, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ajyl / mech_int_othelloGPT
View on GitHub
☆10Nov 6, 2024Updated last year
GraySwanAI / ipi_arena_os
View on GitHub
☆42Mar 18, 2026Updated 4 months ago
TransluceAI / docent
View on GitHub
☆114Jul 10, 2026Updated last week
danielreuter / autofunction
View on GitHub
a metaprogramming language that compiles from types
☆10Jun 26, 2024Updated 2 years ago
rishub-tamirisa / tamper-resistance
View on GitHub
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆68Jun 9, 2025Updated last year
SampsonML / DiscoverPhysics
View on GitHub
☆16May 31, 2026Updated last month
bryant1410 / slurm-cheatsheet
View on GitHub
☆13Mar 29, 2024Updated 2 years ago
pralab / IndicatorsOfAttackFailure
View on GitHub
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
☆19May 23, 2022Updated 4 years ago
tml-epfl / sam-low-rank-features
View on GitHub
Sharpness-Aware Minimization Leads to Low-Rank Features [NeurIPS 2023]
☆29Sep 22, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dreadnode / research
View on GitHub
General research for Dreadnode
☆28Jun 17, 2024Updated 2 years ago
mueller-mp / SAM-ON
View on GitHub
☆34Jan 25, 2024Updated 2 years ago
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
invariantlabs-ai / explorer
View on GitHub
A better way of testing, inspecting, and analyzing AI Agent traces.
☆58Jan 12, 2026Updated 6 months ago
edeyneka / pdf-reader-extension
View on GitHub
☆13Mar 9, 2025Updated last year
allenai / wildteaming
View on GitHub
☆42Aug 10, 2024Updated last year
Call-for-Code / UnityStarterKit
View on GitHub
This is a sample project for getting started with Unity and data visualization.
☆11Jun 5, 2020Updated 6 years ago
goodfire-ai / scribe
View on GitHub
☆85Feb 18, 2026Updated 5 months ago
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
smartyfh / DST-ASSIST
View on GitHub
ASSIST: Towards Label Noise-Robust Dialogue State Tracking
☆10Apr 11, 2022Updated 4 years ago
Alrope123 / prompt-waywardness
View on GitHub
☆14Apr 27, 2022Updated 4 years ago
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
naver-ai / cs-shortcut
View on GitHub
Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)
☆18Nov 24, 2022Updated 3 years ago
qingyue2014 / MoE4DST
View on GitHub
☆12Jul 18, 2023Updated 3 years ago
KellerJordan / top-sgd
View on GitHub
Optimization algorithm which fits a ResNet to CIFAR-10 5x faster than SGD / Adam (with terrible generalization)
☆14Oct 20, 2023Updated 2 years ago
SoYoungCho / Korean-English-NMT
View on GitHub
Neural Machine Translation model for Capstone Project
☆11Apr 11, 2020Updated 6 years ago
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated 11 months ago
facebookresearch / rl-injector
View on GitHub
Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks
☆53May 6, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
jplhughes / dotfiles
View on GitHub
Easily deploy my zsh and tmux configuration on new machines. Includes local and remote aliases to improve workflow.
☆15Apr 23, 2026Updated 2 months ago
science-of-finetuning / diffing-toolkit
View on GitHub
A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.
☆78Updated this week
microsoft / iclr2019-learning-to-represent-edits
View on GitHub
Code for the ICLR 2019 paper "Learning to Represent Edits"
☆13Dec 8, 2022Updated 3 years ago
jason9693 / polyglot-finetuning-oslo
View on GitHub
☆19Sep 20, 2022Updated 3 years ago
RapidResponseBench / rapidresponsebench
View on GitHub
☆35Nov 12, 2024Updated last year
uzaymacar / blackjack-with-gui
View on GitHub
A Blackjack game with GUI written in Java.
☆11Nov 21, 2018Updated 7 years ago
liuchen11 / AdversaryLossLandscape
View on GitHub
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them [NeurIPS 2020]
☆36Jul 3, 2021Updated 5 years ago