chrisliu298 / awesome-llm-unlearning
A resource repository for machine unlearning in large language models
☆249Updated this week
Alternatives and similar repositories for awesome-llm-unlearning:
Users that are interested in awesome-llm-unlearning are comparing it to the libraries listed below
- A survey on harmful fine-tuning attack for large language model☆105Updated last week
- Landing Page for TOFU☆103Updated 6 months ago
- LLM Unlearning☆130Updated last year
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆302Updated this week
- ☆40Updated 5 months ago
- Accepted by IJCAI-24 Survey Track☆170Updated 3 months ago
- ☆28Updated 6 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆110Updated 2 weeks ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆185Updated 2 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆155Updated this week
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆129Updated 7 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆78Updated 3 months ago
- Python package for measuring memorization in LLMs.☆126Updated 3 weeks ago
- JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]☆256Updated 2 months ago
- A collection of automated evaluators for assessing jailbreak attempts.☆79Updated this week
- ☆36Updated 6 months ago
- ☆86Updated last month
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆26Updated last month
- Accepted by ECCV 2024☆81Updated 2 months ago
- ☆34Updated last month
- The lastest paper about detection of LLM-generated text and code☆229Updated last week
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆62Updated 2 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆104Updated 4 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆83Updated 4 months ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆68Updated 3 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆62Updated 2 months ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆19Updated last year
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆262Updated last month
- Code for watermarking language models☆73Updated 3 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆248Updated 9 months ago