A survey on harmful fine-tuning attack for large language model
☆233Feb 25, 2026Updated last week
Alternatives and similar repositories for awesome_LLM-harmful-fine-tuning-papers
Users that are interested in awesome_LLM-harmful-fine-tuning-papers are comparing it to the libraries listed below
Sorting:
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated last month
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆26Sep 10, 2024Updated last year
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆36Mar 22, 2025Updated 11 months ago
- ☆19Jun 21, 2025Updated 8 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆174Apr 23, 2025Updated 10 months ago
- ☆24Dec 8, 2024Updated last year
- ☆14Feb 26, 2025Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆343Feb 23, 2024Updated 2 years ago
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆22Apr 26, 2025Updated 10 months ago
- ☆20Oct 28, 2025Updated 4 months ago
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…☆1,789Updated this week
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆82Updated this week
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- NeurIPS'24 - LLM Safety Landscape☆39Oct 21, 2025Updated 4 months ago