ZiyueWang25 / llm-security-challenge
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the OverTheWire wargames environment, showing the models' surprising ability to do action-oriented cyberexploits in shell environments
☆12Updated last year
Alternatives and similar repositories for llm-security-challenge
Users that are interested in llm-security-challenge are comparing it to the libraries listed below
Sorting:
- Whispers in the Machine: Confidentiality in Agentic Systems☆37Updated last week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆102Updated last year
- LLM security and privacy☆49Updated 7 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated 11 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆50Updated 9 months ago
- Code to break Llama Guard☆31Updated last year
- ☆39Updated 7 months ago
- ☆100Updated 2 months ago
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆67Updated last year
- ☆62Updated 5 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆23Updated last year
- ☆62Updated 10 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆111Updated 11 months ago
- This repository provides a benchmark for prompt Injection attacks and defenses☆196Updated 2 weeks ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆154Updated last week
- ☆55Updated 4 months ago
- LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆25Updated 9 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks☆45Updated 11 months ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆156Updated last month
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆46Updated 6 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆32Updated 11 months ago
- Papers about red teaming LLMs and Multimodal models.☆115Updated 5 months ago
- A collection of prompt injection mitigation techniques.☆22Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆118Updated last month
- TAP: An automated jailbreaking method for black-box LLMs☆167Updated 5 months ago
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆36Updated 2 weeks ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆86Updated last year
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆62Updated 6 months ago
- ☆20Updated last year