WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.
☆164May 29, 2025Updated 10 months ago
Alternatives and similar repositories for wmdp
Users that are interested in wmdp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆27Oct 6, 2024Updated last year
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆30Oct 1, 2024Updated last year
- [NeurIPS D&B '25] The one-stop repository for LLM unlearning☆512Mar 18, 2026Updated last week
- [ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models☆46Dec 4, 2024Updated last year
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆92Sep 30, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆32Aug 9, 2024Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆67Jun 9, 2025Updated 9 months ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- ☆44Oct 1, 2024Updated last year
- [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"☆44Oct 3, 2025Updated 5 months ago
- LLM Unlearning☆182Oct 20, 2023Updated 2 years ago
- ☆32Mar 13, 2025Updated last year
- A resource repository for machine unlearning in large language models☆558Mar 22, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆74Jul 15, 2024Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated 2 months ago
- ☆19Jun 21, 2025Updated 9 months ago
- Constructing community of LLM-based Agent in the minecraft☆17Nov 27, 2025Updated 4 months ago
- ☆187Nov 17, 2025Updated 4 months ago
- ☆20Nov 15, 2024Updated last year
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆890Aug 16, 2024Updated last year
- ☆147Jul 23, 2025Updated 8 months ago
- Steering Llama 2 with Contrastive Activation Addition☆220May 23, 2024Updated last year
- [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts☆38Sep 26, 2024Updated last year
- ☆41May 9, 2025Updated 10 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆345Feb 23, 2024Updated 2 years ago
- ☆45Mar 3, 2023Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Updated this week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Official implement of ACL'25 Findings paper "MMUnlearner: Reformulating Multimodal Machine Unlearning in the Era of Multimodal Large Lang…☆22Jun 17, 2025Updated 9 months ago
- ☆48Sep 29, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆131Feb 24, 2025Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆33Jul 22, 2024Updated last year
- Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"☆19Jun 12, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year