WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.
☆171May 29, 2025Updated last year
Alternatives and similar repositories for wmdp
Users that are interested in wmdp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆27Oct 6, 2024Updated last year
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆30Oct 1, 2024Updated last year
- [ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models☆49Dec 4, 2024Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆543Mar 18, 2026Updated 2 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆95Sep 30, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆32Aug 9, 2024Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆262Sep 24, 2024Updated last year
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆67Jun 9, 2025Updated 11 months ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- ☆45Oct 1, 2024Updated last year
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆45Oct 3, 2025Updated 7 months ago
- LLM Unlearning☆185Oct 20, 2023Updated 2 years ago
- ☆33Mar 13, 2025Updated last year
- A resource repository for machine unlearning in large language models☆588May 12, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆74Jul 15, 2024Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated 4 months ago
- ☆19Jun 21, 2025Updated 11 months ago
- Constructing community of LLM-based Agent in the minecraft☆18May 22, 2026Updated last week
- ☆188Apr 22, 2026Updated last month
- ☆20Nov 15, 2024Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆150Jul 23, 2025Updated 10 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆953Aug 16, 2024Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆232May 23, 2024Updated 2 years ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆350Feb 23, 2024Updated 2 years ago
- ☆45Mar 3, 2023Updated 3 years ago
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆40Sep 26, 2024Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16May 16, 2026Updated last week
- ☆46May 9, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official implement of ACL'25 Findings paper "MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Lang…☆25Jun 17, 2025Updated 11 months ago
- ☆48Sep 29, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆134Feb 24, 2025Updated last year
- Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"☆19Jun 12, 2024Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆33Jul 22, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆57Nov 13, 2023Updated 2 years ago