ZiyueWang25 / llm-security-challengeLinks

Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments

☆13

Alternatives and similar repositories for llm-security-challenge

Users that are interested in llm-security-challenge are comparing it to the libraries listed below

Sorting:

LostOxygen / llm-confidentiality
Whispers in the Machine: Confidentiality in Agentic Systems
☆39Updated 2 months ago
briland / LLM-security-and-privacy
LLM security and privacy
☆49Updated 9 months ago
leondz / lm_risk_cards
Risks and targets for assessing LLMs & LLM vulnerabilities
☆32Updated last year
llm-platform-security / chatgpt-plugin-eval
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
☆27Updated last year
Sensente / Security-Attacks-on-LCCTs
Security Attacks on LLM-based Code Completion Tools (AAAI 2025)
☆20Updated 3 months ago
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆326Updated 6 months ago
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt Injection attacks and defenses
☆255Updated 3 weeks ago
sunblaze-ucb / cybergym
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…
☆49Updated last week
uiuc-kang-lab / InjecAgent
☆70Updated last year
agencyenterprise / PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…
☆401Updated last year
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆165Updated 4 months ago
ethz-spylab / agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆230Updated last week
XHMY / AutoDefense
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆51Updated 2 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆137Updated 4 months ago
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆54Updated 11 months ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆82Updated 8 months ago
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆182Updated 8 months ago
sherdencooper / GPTFuzz
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆513Updated 10 months ago
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆56Updated 11 months ago
andyzoujm / breaking-llama-guard
Code to break Llama Guard
☆31Updated last year
SchwinnL / LLM_Embedding_Attack
Code to conduct an embedding attack on LLMs
☆27Updated 7 months ago
microsoft / TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…
☆62Updated 5 months ago
patrickrchao / JailbreakingLLMs
☆591Updated last month
ethz-spylab / rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆114Updated last year
eurekayuan / RigorLLM
Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
☆23Updated last year
BHui97 / PLeak
☆60Updated 7 months ago
llm-platform-security / SecGPT
An Execution Isolation Architecture for LLM-Based Agentic Systems
☆86Updated 6 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆102Updated last month
Trust4AI / ASTRAL
Automated Safety Testing of Large Language Models
☆16Updated 6 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆52Updated 9 months ago