ZiyueWang25 / llm-security-challenge
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments
☆11Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llm-security-challenge
- Whispers in the Machine: Confidentiality in LLM-integrated Systems☆29Updated 2 weeks ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆46Updated 3 months ago
- LLM security and privacy☆41Updated last month
- Risks and targets for assessing LLMs & LLM vulnerabilities☆25Updated 5 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]☆220Updated 2 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆107Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆84Updated 8 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defenses☆146Updated 2 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆65Updated this week
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆64Updated 9 months ago
- TAP: An automated jailbreaking method for black-box LLMs☆119Updated 8 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆45Updated 2 weeks ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆110Updated 2 months ago
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆59Updated 3 months ago
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆127Updated last month
- Papers about red teaming LLMs and Multimodal models.☆78Updated last month
- ☆38Updated 4 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆34Updated 3 weeks ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆107Updated 8 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆42Updated 6 months ago
- Code to break Llama Guard☆30Updated 11 months ago
- ☆153Updated 11 months ago
- Code for Voice Jailbreak Attacks Against GPT-4o.☆26Updated 5 months ago
- Code to conduct an embedding attack on LLMs☆19Updated last month
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆28Updated 5 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆82Updated 6 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆16Updated 6 months ago
- ☆17Updated 10 months ago
- ☆39Updated 9 months ago
- ☆31Updated last year