☆48Feb 8, 2025Updated last year
Alternatives and similar repositories for InfoDeletionAttacks
Users that are interested in InfoDeletionAttacks are comparing it to the libraries listed below
Sorting:
- ☆60Mar 9, 2023Updated 3 years ago
- [EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents☆16Sep 16, 2025Updated 5 months ago
- OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing tradi…☆15Mar 2, 2026Updated last week
- ☆15Apr 7, 2023Updated 2 years ago
- ☆13Nov 8, 2022Updated 3 years ago
- [NeurIPS'22] Official Repository for Characterizing Datapoints via Second-Split Forgetting☆16Aug 11, 2023Updated 2 years ago
- [ICLR2025] Detecting Backdoor Samples in Contrastive Language Image Pretraining☆19Feb 26, 2025Updated last year
- ☆11Apr 4, 2023Updated 2 years ago
- CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Long Paper)☆17Feb 10, 2025Updated last year
- ☆17Nov 30, 2022Updated 3 years ago
- Code for the paper "Quantifying Privacy Leakage in Graph Embedding" published in MobiQuitous 2020☆17Nov 11, 2021Updated 4 years ago
- ☆65Sep 29, 2024Updated last year
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆22Apr 26, 2025Updated 10 months ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆104Aug 13, 2024Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- The code for the ACL 2023 paper "Linear Classifier: An Often-Forgotten Baseline for Text Classification".☆19Jun 29, 2024Updated last year
- Applying Reinforcement Learning from Human Feedback to language models to teach them to write short story responses to writing prompts.☆14May 5, 2022Updated 3 years ago
- ☆44Apr 25, 2023Updated 2 years ago
- ☆46Jul 14, 2024Updated last year
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆22Sep 21, 2025Updated 5 months ago
- NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment☆22Mar 10, 2024Updated last year
- Code for the CSF 2018 paper "Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting"☆39Jan 28, 2019Updated 7 years ago
- For Certified Robustness to Text Adversarial Attacks by Randomized [MASK]☆17Oct 8, 2024Updated last year
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆87Sep 12, 2024Updated last year
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆174May 4, 2024Updated last year
- ☆27Feb 25, 2025Updated last year
- Code for IJCAI 2019 paper "Real-time Adversarial Attack".☆20Jul 4, 2020Updated 5 years ago
- ☆45Nov 10, 2019Updated 6 years ago
- An Embarrassingly Simple Backdoor Attack on Self-supervised Learning☆20Jan 24, 2024Updated 2 years ago
- Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''☆53Nov 16, 2022Updated 3 years ago
- ☆301Jan 13, 2026Updated last month
- quick playground to animate pippin☆15Nov 11, 2024Updated last year
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆27Nov 18, 2024Updated last year
- This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on …☆98Oct 18, 2022Updated 3 years ago
- LLM Unlearning☆182Oct 20, 2023Updated 2 years ago
- ☆59Jun 17, 2020Updated 5 years ago
- [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.☆2,732Updated this week
- Source code for the paper "Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness"☆25Feb 12, 2020Updated 6 years ago
- Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"☆66Oct 27, 2024Updated last year