zhaoyiran924 / Safety-NeuronView external linksLinks
[ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
☆27Apr 30, 2025Updated 9 months ago
Alternatives and similar repositories for Safety-Neuron
Users that are interested in Safety-Neuron are comparing it to the libraries listed below
Sorting:
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- Data and code for the paper: Finding Safety Neurons in Large Language Models☆21Jan 29, 2026Updated 2 weeks ago
- [ICML 2023] "NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations" by Yonggan …☆18Mar 10, 2024Updated last year
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".☆35Jul 10, 2025Updated 7 months ago
- A Watermark-Conditioned Diffusion Model for IP Protection (ECCV 2024)☆34Apr 5, 2025Updated 10 months ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆39Aug 20, 2025Updated 5 months ago
- ☆11Oct 15, 2024Updated last year
- ☆12Sep 28, 2023Updated 2 years ago
- ☆10Mar 20, 2023Updated 2 years ago
- ☆11Jun 5, 2024Updated last year
- [ECCV 2024] Characterizing Robustness via Natural Input Gradients☆13Oct 14, 2024Updated last year
- ☆11Mar 24, 2023Updated 2 years ago
- Code for the paper "Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction" …☆12Sep 6, 2023Updated 2 years ago
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Feb 20, 2025Updated 11 months ago
- ☆12Oct 17, 2024Updated last year
- Applications for OpenCL testing on Toradex Apalis iMX6Q☆12Dec 2, 2022Updated 3 years ago
- This is the official repo of the paper "Latent Guard: a Safety Framework for Text-to-image Generation"☆52Oct 24, 2024Updated last year
- Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]☆73Dec 17, 2025Updated 2 months ago
- Numpy手写BP神经网络,对比Dropout、Batch Normalization等训练技巧的效果。☆11Dec 19, 2019Updated 6 years ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆13Oct 13, 2025Updated 4 months ago
- RESAnything: Attribute Prompting for Arbitrary Referring Segmentation☆17Nov 28, 2025Updated 2 months ago
- Source code of BI-Mamba for cardiovascular disease detection from two-view chest X-rays☆14Dec 10, 2025Updated 2 months ago
- Cambridge Arboreal Modelling Panoptic 3D: Pipeline and Dataset☆25Sep 16, 2025Updated 5 months ago
- 一款功能强大的调研问卷系统;有多种题型可供选择,拖动即可生成,支持在线预览,报表查询等。支持题目之间相互跳转和显示。社区版已上线,部分功能正在更新中。。。敬请期待!!!☆10Oct 25, 2024Updated last year
- SuperGS: Super-Resolution 3D Gaussian Splatting Enhanced by Variational Residual Features and Uncertainty-Augmented Learning☆11May 24, 2025Updated 8 months ago
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!☆14Apr 8, 2025Updated 10 months ago
- ☆11Apr 3, 2024Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆28Aug 22, 2025Updated 5 months ago
- [USENIX Security 2022] Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture☆16Aug 29, 2022Updated 3 years ago
- simple solution based on Gradient Boost and Random Forest, rank 24/3251 (top 1%) within 60 lines of python code☆14Jun 21, 2019Updated 6 years ago
- Have an LLM write your biography, probably incorrectly☆13Dec 26, 2024Updated last year
- ☆14Nov 7, 2022Updated 3 years ago
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆34Jan 16, 2026Updated last month
- Towards Deep Learning Models Resistant to Adversarial Attacks论文复现☆15Aug 18, 2021Updated 4 years ago
- 强化学习课程,主要是如何用强化学习解决问题☆15Dec 10, 2024Updated last year
- Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves☆17Jul 11, 2025Updated 7 months ago
- Survey on LLM Inference via Search (TMLR 2025)☆14May 6, 2025Updated 9 months ago
- Code and full version of the paper "Hijacking Attacks against Neural Network by Analyzing Training Data"☆14Feb 28, 2024Updated last year