git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆70Updated this week
Related projects ⓘ
Alternatives and complementary repositories for awesome_LLM-harmful-fine-tuning-papers
- ☆34Updated 3 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆16Updated 3 weeks ago
- Accepted by ECCV 2024☆73Updated 3 weeks ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆67Updated 2 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆58Updated last month
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆46Updated 3 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆70Updated 2 months ago
- LLM Unlearning☆123Updated last year
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆18Updated 11 months ago
- ☆15Updated 3 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆42Updated last month
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆45Updated last month
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆125Updated this week
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆28Updated 2 weeks ago
- ☆30Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆83Updated 5 months ago
- ☆31Updated 4 months ago
- ☆66Updated 11 months ago
- ☆39Updated last week
- A resource repository for machine unlearning in large language models☆210Updated this week
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆20Updated last year
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆70Updated 6 months ago
- ☆28Updated 4 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆58Updated last month
- ☆26Updated 3 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆13Updated 4 months ago
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆13Updated last month
- Official Repo of ICLR 24 BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆16Updated 3 months ago
- A toolkit to assess data privacy in LLMs (under development)☆41Updated last month
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆38Updated 3 weeks ago