Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Jul 15, 2024Updated last year
Alternatives and similar repositories for adv_tracing
Users that are interested in adv_tracing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆27Nov 18, 2024Updated last year
- ☆42Jul 16, 2024Updated last year
- The official code repository for the paper "CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments…☆30Dec 10, 2025Updated 3 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated 2 months ago
- ☆14Apr 14, 2025Updated 11 months ago
- Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation (NeurIPS 2022)☆33Dec 16, 2022Updated 3 years ago
- [NeurIPS 2025] Implementation for paper "Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text"☆37Jun 10, 2025Updated 9 months ago
- Context-central multi-agent framework with PyTorch-like API. Build intelligent agent systems with minimal code.☆74Oct 26, 2025Updated 4 months ago
- Code for our ICLR 2023 paper Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples.☆18May 31, 2023Updated 2 years ago
- Codes for the EMNLP2021 paper: Benchmarking Commonsense Knowledge Base Population (https://aclanthology.org/2021.emnlp-main.705.pdf). An …☆26Feb 14, 2024Updated 2 years ago
- ☆35Sep 13, 2023Updated 2 years ago
- Implementing a Neural Network from Scratch☆12Jul 17, 2017Updated 8 years ago
- Blogs that I'm actively following.☆14Sep 17, 2023Updated 2 years ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Nov 7, 2024Updated last year
- ☆14Feb 26, 2025Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆26Feb 25, 2025Updated last year
- A C++ implementation of a Generalized Suffix Tree using Ukkonen's algorithm☆22Jul 11, 2016Updated 9 years ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated 2 months ago
- Please go to https://github.com/facebookresearch/stable_signature☆15Jul 26, 2023Updated 2 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 9 months ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 3 months ago
- ☆48Sep 29, 2024Updated last year
- ☆19May 14, 2025Updated 10 months ago
- ☆52Oct 23, 2023Updated 2 years ago
- ☆10Apr 28, 2020Updated 5 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated last month
- [NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes"☆10Sep 18, 2025Updated 6 months ago
- A repository for the query-efficient black-box attack, SignHunter☆22Jan 15, 2020Updated 6 years ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- [EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples☆11Apr 6, 2023Updated 2 years ago
- [NeurIPS 2023] Official repository for "Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models"☆11Jun 18, 2024Updated last year
- Demo code for the paper: One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features☆12Nov 30, 2023Updated 2 years ago
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- ☆58Jun 30, 2023Updated 2 years ago
- ☆11Jun 20, 2023Updated 2 years ago
- [ECCV-2024] Transferable Targeted Adversarial Attack, CLIP models, Generative adversarial network, Multi-target attacks☆38Apr 23, 2025Updated 11 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.