Some code for "Stealing Part of a Production Language Model"
☆22Mar 20, 2024Updated 2 years ago
Alternatives and similar repositories for stealing-part-lm-supplementary
Users that are interested in stealing-part-lm-supplementary are comparing it to the libraries listed below
Sorting:
- The official implementation of the EMNLP 2023 paper "Paraphrase Types for Generation and Detection"☆12Oct 20, 2024Updated last year
- icml24☆14Feb 24, 2025Updated last year
- [USENIX Security 2025] SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks☆20Sep 18, 2025Updated 6 months ago
- Web Client for Instaclone built with React, Apollo, Styled Components and more!☆10Mar 19, 2021Updated 5 years ago
- Code for "CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples" (NDSS 2020)☆22Nov 14, 2020Updated 5 years ago
- ☆19Feb 25, 2024Updated 2 years ago
- Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"☆13Jul 18, 2024Updated last year
- This is the implementation code for the WWW2021 paper "Variation Control and Evaluation for Generative Slate Recommendation"☆15Jun 7, 2021Updated 4 years ago
- Pytorch implementation of NPAttack☆12Jul 7, 2020Updated 5 years ago
- Playing around with various jailbreaking techniques ahead of the Gray Swan AI Ultimate Jailbreaking Competition☆18Oct 6, 2024Updated last year
- Find context neurons in Pythia models.☆13Jun 13, 2023Updated 2 years ago
- ICML2025 | From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models☆35Sep 17, 2025Updated 6 months ago
- Tutorial by Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia and Felice Antonio Merra about Adversarial Machine Learning in Recommende…☆25Apr 12, 2021Updated 4 years ago
- 2022华为软件精英挑战赛 - 杭厦赛区 - 土豪法称霸杭厦 - 决赛季军☆14Jul 31, 2023Updated 2 years ago
- Defending against Model Stealing via Verifying Embedded External Features☆38Feb 19, 2022Updated 4 years ago
- Official repository for "On the Multi-modal Vulnerability of Diffusion Models"☆16Jul 15, 2024Updated last year
- ☆17Aug 2, 2022Updated 3 years ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- [ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations☆15Oct 28, 2023Updated 2 years ago
- ☆18Dec 12, 2025Updated 3 months ago
- Source code for "Neural Anisotropy Directions"☆16Nov 17, 2020Updated 5 years ago
- All code and data necessary to replicate experiments in the paper BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Model…☆13Sep 16, 2024Updated last year
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- Accepted to ICLR 2025. MetaMetrics is a calibrated meta-metric designed to evaluate generation tasks across different modalities aligned …☆14Dec 30, 2024Updated last year
- Implementations of 3 phishing detection and identification baselines☆21Nov 25, 2024Updated last year
- Example code of [Tianchi AAAI2022 Security AI Challenger Program Phase 8]☆22Feb 9, 2022Updated 4 years ago
- [USENIX Security'24] REMARK-LLM: A robust and efficient watermarking framework for generative large language models☆27Oct 23, 2024Updated last year
- A coverage library for Chisel designs☆11Mar 12, 2020Updated 6 years ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆22May 6, 2025Updated 10 months ago
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Sep 13, 2024Updated last year
- Programs generated by ChatGPT☆27Jul 19, 2023Updated 2 years ago
- Official Code for reproductivity of the NeurIPS 2023 paper: Adversarial Examples Are Not Real Features☆16Jun 27, 2024Updated last year
- Graph Theory Algorithm is implemented in python. Jupyter Notebook is used to demonstrate the concept and Networkx library is used in seve…☆26Oct 15, 2018Updated 7 years ago
- CVPR 2023 generalist☆16Oct 25, 2023Updated 2 years ago
- ☆15Dec 12, 2022Updated 3 years ago
- [CCS-LAMPS'24] LLM IP Protection Against Model Merging☆16Oct 14, 2024Updated last year
- (AAAI 2024) Transferable Adversarial Attacks for Object Detection using Object-Aware Significant Feature Distortion☆16Dec 13, 2023Updated 2 years ago
- ☆23Mar 11, 2025Updated last year
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year