ydyjya / Awesome-LLM-SafetyLinks

A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide researchers, practitioners, and enthusiasts with insights into the safety implications, challenges, and advancements surrounding these powerful models.

☆1,651

Alternatives and similar repositories for Awesome-LLM-Safety

Users that are interested in Awesome-LLM-Safety are comparing it to the libraries listed below

Sorting:

CryptoAILab / Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆1,703Updated this week
chawins / llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
☆536Updated 4 months ago
corca-ai / awesome-llm-security
A curation of awesome tools, documents and projects about LLM Security.
☆1,424Updated 2 months ago
liudaizong / Awesome-LVLM-Attack
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆398Updated last week
EasyJailbreak / EasyJailbreak
An easy-to-use Python framework to generate adversarial jailbreak prompts.
☆735Updated 6 months ago
HillZhang1999 / llm-hallucination-survey
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …
☆1,052Updated 3 weeks ago
xianshang33 / llm-paper-daily
Daily updated LLM papers. 每日更新 LLM 相关的论文，欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
☆1,187Updated last year
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆385Updated 9 months ago
JailbreakBench / jailbreakbench
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆438Updated 6 months ago
zjunlp / KnowledgeEditingPapers
Must-read Papers on Knowledge Editing for Large Language Models.
☆1,180Updated 3 months ago
xingjunm / Awesome-Large-Model-Safety
Safety at Scale: A Comprehensive Survey of Large Model Safety
☆200Updated 8 months ago
whitzard-ai / jade-db
"他山之石、可以攻玉"：复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB
☆465Updated last week
sherdencooper / GPTFuzz
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆531Updated last year
chrisliu298 / awesome-llm-unlearning
A resource repository for machine unlearning in large language models
☆498Updated 3 months ago
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆209Updated 8 months ago
HowieHwong / TrustLLM
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
☆600Updated 4 months ago
gitkolento / SecProbe
SecProbe：任务驱动式大模型安全能力评测系统
☆14Updated 10 months ago
yueliu1999 / Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…
☆980Updated 2 weeks ago
zepingyu0512 / awesome-llm-understanding-mechanism
awesome papers in LLM interpretability
☆564Updated 2 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆215Updated this week
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆257Updated 2 months ago
isXinLiu / Awesome-MLLM-Safety
Accepted by IJCAI-24 Survey Track
☆218Updated last year
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆358Updated 10 months ago
plll4zzx / Awesome-LLM-Watermark
A collection list for Large Language Model (LLM) Watermark
☆47Updated 8 months ago
PKU-YuanGroup / Reasoning-Attack
☆135Updated 7 months ago
HqWu-HITCS / Awesome-LLM-Survey
An Awesome Collection for LLM Survey
☆377Updated 4 months ago
thu-coai / ShieldLM
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆212Updated last year
bboylyg / BackdoorLLM
[NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
☆223Updated last month
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆152Updated 10 months ago