theshi-1128 / llm-defenseLinks
An easy-to-use Python framework to defend against jailbreak prompts.
β21Updated 6 months ago
Alternatives and similar repositories for llm-defense
Users that are interested in llm-defense are comparing it to the libraries listed below
Sorting:
- Safety at Scale: A Comprehensive Survey of Large Model Safetyβ194Updated 7 months ago
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β394Updated last week
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Modelsβ202Updated 7 months ago
- Papers and resources related to the security and privacy of LLMs π€β536Updated 4 months ago
- DSN jailbreak Attack & Evaluation Ensembleβ10Updated 2 months ago
- β65Updated 4 months ago
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Mβ¦β383Updated 8 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ173Updated 3 months ago
- [NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Modelsβ219Updated 2 weeks ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by farβ21Updated 6 months ago
- β36Updated last year
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ106Updated last year
- Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)β64Updated this week
- An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)β191Updated 2 years ago
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Modelsβ240Updated last year
- β21Updated last year
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).β1,691Updated this week
- Agent Security Bench (ASB)β124Updated this week
- Accepted by IJCAI-24 Survey Trackβ216Updated last year
- Accepted by ECCV 2024β158Updated 11 months ago
- β46Updated last year
- β32Updated 6 months ago
- A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Modelsβ234Updated last month
- The official implementation of our NAACL 2024 paper "A Wolf in Sheepβs Clothing: Generalized Nested Jailbreak Prompts can Fool Large Langβ¦β137Updated last month
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]β424Updated 6 months ago
- β15Updated last year
- β223Updated last month
- β82Updated last month
- β104Updated 8 months ago
- An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)β99Updated 8 months ago