WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.
☆169May 29, 2025Updated 11 months ago
Alternatives and similar repositories for wmdp
Users that are interested in wmdp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆27Oct 6, 2024Updated last year
- [ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models☆48Dec 4, 2024Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆533Mar 18, 2026Updated last month
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆94Sep 30, 2024Updated last year
- ☆31Aug 9, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Improving Alignment and Robustness with Circuit Breakers☆261Sep 24, 2024Updated last year
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆66Jun 9, 2025Updated 11 months ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- ☆44Oct 1, 2024Updated last year
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆44Oct 3, 2025Updated 7 months ago
- LLM Unlearning☆184Oct 20, 2023Updated 2 years ago
- ☆33Mar 13, 2025Updated last year
- A resource repository for machine unlearning in large language models☆579May 1, 2026Updated last week
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated 3 months ago
- ☆19Jun 21, 2025Updated 10 months ago
- Constructing community of LLM-based Agent in the minecraft☆18Nov 27, 2025Updated 5 months ago
- ☆188Apr 22, 2026Updated 2 weeks ago
- ☆20Nov 15, 2024Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ☆150Jul 23, 2025Updated 9 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆940Aug 16, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Steering Llama 2 with Contrastive Activation Addition☆228May 23, 2024Updated last year
- ☆43May 9, 2025Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆350Feb 23, 2024Updated 2 years ago
- ☆45Mar 3, 2023Updated 3 years ago
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆40Sep 26, 2024Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Updated this week
- Official implement of ACL'25 Findings paper "MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Lang…☆24Jun 17, 2025Updated 10 months ago
- ☆48Sep 29, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆133Feb 24, 2025Updated last year
- Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"☆19Jun 12, 2024Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆33Jul 22, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆24Oct 18, 2024Updated last year
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆57Nov 13, 2023Updated 2 years ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆93May 9, 2024Updated 2 years ago