tjunlp-lab/Awesome-LLM-Safety-Papers

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tjunlp-lab/Awesome-LLM-Safety-Papers)

tjunlp-lab / Awesome-LLM-Safety-Papers

☆56

Alternatives and similar repositories for Awesome-LLM-Safety-Papers

Users that are interested in Awesome-LLM-Safety-Papers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
DYR1 / MoGU
View on GitHub
Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Jan 14, 2025Updated last year
rgreenblatt / model_organism_public
View on GitHub
☆15Jun 17, 2025Updated last year
EleutherAI / deep-ignorance
View on GitHub
☆20Jan 7, 2026Updated 6 months ago
MuyuenLP / AdaSteer
View on GitHub
EMNLP 25 Oral - AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
☆19Feb 7, 2026Updated 5 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
niconi19 / LLM-Conversation-Safety
View on GitHub
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆111Aug 7, 2024Updated last year
inistory / STONE-watermarking
View on GitHub
Official repository of the paper: Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code (Findings of EACL …
☆12Mar 26, 2026Updated 3 months ago
zepingyu0512 / in-context-mechanism
View on GitHub
code for EMNLP 2024 paper: How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for M…
☆13Nov 17, 2024Updated last year
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
baixianghuang / HalluEditBench
View on GitHub
Can Knowledge Editing Really Correct Hallucinations? (ICLR 2025)
☆26Aug 10, 2025Updated 11 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
View on GitHub
A survey on harmful fine-tuning attack for large language model (ACM CSUR)
☆247Jun 22, 2026Updated last month
LyWang12 / CUTI-Domain
View on GitHub
☆15Feb 11, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
xjzzzzzzzz / MCPSafety
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
arobey1 / smooth-llm
View on GitHub
☆135Nov 13, 2023Updated 2 years ago
privacyradius / ccpa-checklist
View on GitHub
The CCPA Checklist
☆15May 19, 2023Updated 3 years ago
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
zhang-wei-chao / DC-PDD
View on GitHub
This repository presents the original implementation of Pretraining Data Detection for Large Language Models: A Divergence-based Calibrat…
☆23May 21, 2025Updated last year
DyMessi / VisCRA
View on GitHub
☆19Dec 23, 2025Updated 7 months ago
joao-siilva / studies
View on GitHub
Descrição diário da toda minha trajetória de estudos
☆15Jan 30, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
AlignmentResearch / scaling-poisoning
View on GitHub
☆17Nov 18, 2024Updated last year
git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆28Sep 10, 2024Updated last year
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
butterfly-community / oskey-firmware
View on GitHub
The firmware of the Butterfly OSKey, Free from vendor lock-in, supporting hundreds of chips from multiple manufacturers. The minimum supp…
☆42Feb 20, 2026Updated 5 months ago
zhuohuangai / SharpDRO
View on GitHub
Code for CVPR 2023 Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
☆13Mar 27, 2023Updated 3 years ago
AmbitYuki / RoFed-LLM
View on GitHub
Robust Federated Learning for Large Language Models in Adversarial Wireless Environments
☆16Mar 7, 2025Updated last year
DPamK / BadAgent
View on GitHub
☆33Feb 27, 2025Updated last year
cispa / gdpr-consent
View on GitHub
Code for our paper: "Share First, Ask Later (or Never?) - Studying Violations of GDPR's Explicit Consent in Android Apps"
☆12Oct 20, 2022Updated 3 years ago
AI4Good24 / PsySafe
View on GitHub
☆53Feb 8, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Maras13 / git_playground
View on GitHub
A hands-on repository for learning GitHub basics! Dive into beginner-friendly exercises that guide you through creating repositories, mak…
☆15Dec 3, 2024Updated last year
DennisLiu2022 / Membership-Inference-Attacks-by-Exploiting-Loss-Trajectory
View on GitHub
☆25Nov 14, 2022Updated 3 years ago
ltroin / llm_attack_defense_arena
View on GitHub
☆86Sep 5, 2025Updated 10 months ago
uiuc-kang-lab / InjecAgent
View on GitHub
☆153Jul 2, 2024Updated 2 years ago
platelminto / NudeNetClassifier
View on GitHub
A Neural Net for Nudity Detection. Classifier only.
☆19Jan 23, 2023Updated 3 years ago
antonsteenvoorden / ml1m-images
View on GitHub
URLs to each movie from the MovieLens-1M data set
☆10Mar 12, 2020Updated 6 years ago
liuzrcc / AIP
View on GitHub
Adversarial Item Promotion in visually-aware recommenders
☆17Sep 3, 2021Updated 4 years ago