corca-ai / awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
β956Updated this week
Related projects β
Alternatives and complementary repositories for awesome-llm-security
- Papers and resources related to the security and privacy of LLMs π€β441Updated 2 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusalβ343Updated 3 months ago
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).β955Updated this week
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provideβ¦β1,019Updated last week
- β413Updated 3 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Promptsβ404Updated 2 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".β246Updated last month
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]β236Updated last month
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defensesβ146Updated 2 months ago
- We jailbreak GPT-3.5 Turboβs safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20β¦β240Updated 9 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]β221Updated 2 months ago
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to aβ¦β313Updated 8 months ago
- Awesome LLM Jailbreak academic papersβ77Updated last year
- Papers about red teaming LLMs and Multimodal models.β78Updated this week
- An easy-to-use Python framework to generate adversarial jailbreak prompts.β484Updated 2 months ago
- π§ LLMFuzzer - Fuzzing Framework for Large Language Models π§ LLMFuzzer is the first open-source fuzzing framework specifically designed β¦β234Updated 9 months ago
- TAP: An automated jailbreaking method for black-box LLMsβ119Updated 8 months ago
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Modelsβ470Updated last month
- LLM security and privacyβ41Updated last month
- Every practical and proposed defense against prompt injection.β347Updated 5 months ago
- A curated list of MLSecOps tools, articles and other resources on security applied to Machine Learning and MLOps systems.β246Updated last month
- A fast + lightweight implementation of the GCG algorithm in PyTorchβ127Updated last month
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β134Updated last week
- The automated prompt injection framework for LLM-integrated applications.β163Updated 2 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ77Updated 3 months ago
- List of papers on hallucination detection in LLMs.β682Updated this week
- Must-read Papers on Knowledge Editing for Large Language Models.β925Updated this week
- OWASP Foundation Web Respositoryβ583Updated this week
- Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Modelsβ674Updated 5 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMsβ185Updated 5 months ago