ftramer / LM_Memorization
Training data extraction on GPT-2
☆166Updated last year
Related projects: ⓘ
- ☆268Updated last month
- Python package for measuring memorization in LLMs.☆107Updated this week
- Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"☆98Updated last year
- A codebase that makes differentially private training of transformers easy.☆151Updated last year
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.☆77Updated 4 months ago
- Differentially-private transformers using HuggingFace and Opacus☆108Updated 3 weeks ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆275Updated last month
- A re-implementation of the "Extracting Training Data from Large Language Models" paper by Carlini et al., 2020☆30Updated 2 years ago
- A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code☆58Updated 3 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆61Updated 4 months ago
- ☆101Updated last year
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆131Updated 10 months ago
- Code for the paper "Weight Poisoning Attacks on Pre-trained Models" (ACL 2020)☆137Updated 3 years ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆76Updated 6 months ago
- Landing Page for TOFU☆79Updated 3 months ago
- A resource repository for machine unlearning in large language models☆131Updated this week
- ☆61Updated 2 years ago
- LLM Unlearning☆112Updated 11 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆100Updated 3 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆50Updated this week
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆55Updated 8 months ago
- Repository for research in the field of Responsible NLP at Meta.☆180Updated last month
- ☆47Updated last year
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆61Updated last week
- ☆47Updated 3 months ago
- Code for watermarking language models☆69Updated 2 weeks ago
- ☆63Updated 10 months ago
- Aligning AI With Shared Human Values (ICLR 2021)☆233Updated last year
- Starter kit and data loading code for the Trojan Detection Challenge NeurIPS 2022 competition☆33Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆72Updated 4 months ago