chawins / llm-sp
Papers and resources related to the security and privacy of LLMs π€
β433Updated 2 months ago
Related projects β
Alternatives and complementary repositories for llm-sp
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusalβ341Updated 3 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".β245Updated 3 weeks ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]β232Updated last month
- β413Updated 3 months ago
- A collection of automated evaluators for assessing jailbreak attempts.β75Updated 4 months ago
- Papers about red teaming LLMs and Multimodal models.β78Updated last month
- A curation of awesome tools, documents and projects about LLM Security.β955Updated this week
- TAP: An automated jailbreaking method for black-box LLMsβ119Updated 8 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Promptsβ403Updated last month
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).β948Updated this week
- We jailbreak GPT-3.5 Turboβs safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20β¦β241Updated 8 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defensesβ146Updated 2 months ago
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β133Updated last week
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ76Updated 3 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873β122Updated 6 months ago
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Modelsβ93Updated last month
- A Comprehensive Assessment of Trustworthiness in GPT Modelsβ260Updated 2 months ago
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provideβ¦β1,005Updated this week
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Modelsβ468Updated last month
- A resource repository for machine unlearning in large language modelsβ218Updated last week
- An easy-to-use Python framework to generate adversarial jailbreak prompts.β479Updated 2 months ago
- A fast + lightweight implementation of the GCG algorithm in PyTorchβ127Updated last month
- LLM security and privacyβ41Updated last month
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Modelsβ183Updated 6 months ago
- Awesome LLM Jailbreak academic papersβ76Updated last year
- Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ87Updated 6 months ago
- Accepted by IJCAI-24 Survey Trackβ159Updated 2 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]β220Updated 2 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decodingβ99Updated 4 months ago
- Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, dataβ¦β357Updated this week