WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.
☆175May 29, 2025Updated last year
Alternatives and similar repositories for wmdp
Users that are interested in wmdp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆27Oct 6, 2024Updated last year
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆30Oct 1, 2024Updated last year
- [ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models☆49Dec 4, 2024Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆551Mar 18, 2026Updated 3 months ago
- ☆32Aug 9, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆98Sep 30, 2024Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆263Sep 24, 2024Updated last year
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆67Jun 9, 2025Updated last year
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- ☆45Oct 1, 2024Updated last year
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆45Oct 3, 2025Updated 8 months ago
- LLM Unlearning☆185Oct 20, 2023Updated 2 years ago
- ☆34Mar 13, 2025Updated last year
- A resource repository for machine unlearning in large language models☆598Jun 10, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆75Jul 15, 2024Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆50Jan 15, 2026Updated 5 months ago
- ☆19Jun 21, 2025Updated 11 months ago
- A Soul-grounded Minecraft social simulation runtime where Mineflayer actors pursue LifeGoals through evidence-backed action skills and tr…☆23Updated this week
- ☆189Apr 22, 2026Updated last month
- ☆20Nov 15, 2024Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆150Jul 23, 2025Updated 10 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆981Aug 16, 2024Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆236May 23, 2024Updated 2 years ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆353Feb 23, 2024Updated 2 years ago
- ☆45Mar 3, 2023Updated 3 years ago
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆40Sep 26, 2024Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Jun 11, 2026Updated last week
- ☆54May 9, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official implement of ACL'25 Findings paper "MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Lang…☆25Jun 17, 2025Updated last year
- ☆48Sep 29, 2024Updated last year
- Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"☆19Jun 12, 2024Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆136Feb 24, 2025Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆26Nov 29, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆24Oct 18, 2024Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆34Jul 22, 2024Updated last year