ZiyueWang25 / llm-security-challenge
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments
☆11Updated last year
Alternatives and similar repositories for llm-security-challenge:
Users that are interested in llm-security-challenge are comparing it to the libraries listed below
- Whispers in the Machine: Confidentiality in LLM-integrated Systems☆35Updated 3 weeks ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated 10 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆49Updated 7 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆109Updated 9 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆23Updated 10 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆67Updated last year
- Code to break Llama Guard☆31Updated last year
- LLM security and privacy☆48Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆98Updated last year
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 11 months ago
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆108Updated last year
- Dataset for the Tensor Trust project☆39Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆105Updated 2 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆116Updated last week
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆59Updated 5 months ago
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆39Updated 9 months ago
- Code used to run the platform for the LLM CTF colocated with SaTML 2024☆26Updated last year
- ☆42Updated 2 years ago
- ☆57Updated 9 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆61Updated last year
- [ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents☆79Updated last month
- Fluent student-teacher redteaming☆20Updated 8 months ago
- LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆25Updated 8 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆135Updated last year
- ☆42Updated 8 months ago
- ☆90Updated last month
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆289Updated 2 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆43Updated 5 months ago
- ☆31Updated last year
- ☆22Updated 7 months ago