Code to replicate the Representation Noising paper and tools for evaluating defences against harmful fine-tuning
☆24Dec 12, 2024Updated last year
Alternatives and similar repositories for representation-noising
Users that are interested in representation-noising are comparing it to the libraries listed below
Sorting:
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated last month
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆67Jun 9, 2025Updated 9 months ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆26Sep 10, 2024Updated last year
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!☆15Apr 8, 2025Updated 11 months ago
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆36Mar 22, 2025Updated 11 months ago
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- How Robust are Randomized Smoothing based Defenses to Data Poisoning? (CVPR 2021)☆14Jul 16, 2021Updated 4 years ago
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks☆14Feb 6, 2024Updated 2 years ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Mar 30, 2025Updated 11 months ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 10 months ago
- CVPR 2025 - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning☆22Aug 28, 2025Updated 6 months ago
- ☆24Feb 17, 2026Updated 2 weeks ago
- ☆24Dec 8, 2024Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated 2 years ago
- Program and links to the material for the GloBIAS Training School 2025, Kobe, Japan.☆22Oct 27, 2025Updated 4 months ago
- MirMachine, a command line tool to detect microRNA homologs in genome sequences.☆13Updated this week
- Un chat que construimos en vivo en https://twitch.tv/xabadu 📺🍅🔥☆10Mar 5, 2023Updated 3 years ago
- Official Implementation of "Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts" at EMNLP 202…☆13Oct 27, 2024Updated last year
- Chaos Magick Sigils☆15Jan 30, 2026Updated last month
- Documentation at☆14Mar 27, 2025Updated 11 months ago
- Colecciones para el tutorial Electrónica digital para Makers con FPGAs Libres☆11Dec 4, 2018Updated 7 years ago
- Please star this and feel free to look up on mario maker☆12Jan 24, 2023Updated 3 years ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Rep…☆48Apr 27, 2022Updated 3 years ago
- ☆44Oct 1, 2024Updated last year
- Base Formality libraries☆10Mar 4, 2019Updated 7 years ago
- A method for training neural networks that are provably robust to adversarial attacks. [IJCAI 2019]☆10Sep 3, 2019Updated 6 years ago
- ☆14Dec 1, 2025Updated 3 months ago
- A WIP library to control B1500 and similar testers via the VISA protocol, built on pyvisa☆11Nov 4, 2016Updated 9 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- Code used to produce experimental results for the paper "Deep Structured Prediction with Nonlinear Output Activations"☆11May 6, 2019Updated 6 years ago
- ☆19May 14, 2025Updated 9 months ago
- ☆14Feb 26, 2025Updated last year
- My notes on natural history, science, and technology.☆17Dec 21, 2025Updated 2 months ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 2 months ago
- ☆10Oct 31, 2022Updated 3 years ago