ZiyueWang25 / llm-security-challengeLinks
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments
☆13Updated 2 years ago
Alternatives and similar repositories for llm-security-challenge
Users that are interested in llm-security-challenge are comparing it to the libraries listed below
Sorting:
- Whispers in the Machine: Confidentiality in Agentic Systems☆41Updated last week
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆55Updated 4 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆32Updated last year
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆26Updated last year
- An Execution Isolation Architecture for LLM-Based Agentic Systems☆92Updated 8 months ago
- LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆28Updated last year
- ☆86Updated 10 months ago
- LLM security and privacy☆51Updated 11 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆350Updated 8 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆55Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆425Updated last year
- This repository provides a benchmark for prompt Injection attacks and defenses☆289Updated 2 months ago
- ☆80Updated last year
- Code used to run the platform for the LLM CTF colocated with SaTML 2024☆26Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆155Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆118Updated last year
- ☆62Updated 9 months ago
- Official repo for Customized but Compromised: Assessing Prompt Injection Risks in User-Designed GPTs☆29Updated last year
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆70Updated last week
- Code to break Llama Guard☆32Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆170Updated 6 months ago
- prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记☆242Updated 7 months ago
- Automated Safety Testing of Large Language Models☆16Updated 8 months ago
- This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content usin…☆14Updated 2 years ago
- ☆150Updated 3 months ago
- A collection of prompt injection mitigation techniques.☆24Updated 2 years ago
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆66Updated last month
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆288Updated last month
- ☆50Updated 11 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆62Updated last year