chawins / llm-sp
Papers and resources related to the security and privacy of LLMs π€
β393Updated last week
Related projects: β
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).β750Updated this week
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".β203Updated last month
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Modelsβ432Updated last week
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provideβ¦β844Updated this week
- β357Updated last month
- A curation of awesome tools, documents and projects about LLM Security.β873Updated 3 weeks ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusalβ275Updated last month
- An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]β169Updated last month
- We jailbreak GPT-3.5 Turboβs safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20β¦β219Updated 6 months ago
- TAP: An automated jailbreaking method for black-box LLMsβ106Updated 6 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defensesβ125Updated 2 weeks ago
- A Comprehensive Assessment of Trustworthiness in GPT Modelsβ250Updated this week
- Papers about red teaming LLMs and Multimodal models.β66Updated this week
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Promptsβ366Updated 5 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ65Updated last month
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873β110Updated 4 months ago
- A collection of automated evaluators for assessing jailbreak attempts.β55Updated 2 months ago
- An easy-to-use Python framework to generate adversarial jailbreak prompts.β403Updated 2 weeks ago
- LLM hallucination paper listβ268Updated 6 months ago
- LLM security and privacyβ38Updated 5 months ago
- Must-read Papers on Knowledge Editing for Large Language Models.β829Updated 2 weeks ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]β181Updated last month
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Modelsβ61Updated this week
- UP-TO-DATE LLM Watermark paper. π₯π₯π₯β253Updated 3 months ago
- Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ76Updated 4 months ago
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Modelsβ156Updated 4 months ago
- A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant codeβ58Updated 3 months ago
- Awesome LLM Jailbreak academic papersβ61Updated 10 months ago
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.β246Updated last month
- β143Updated 9 months ago