[ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
☆29Apr 30, 2025Updated 10 months ago
Alternatives and similar repositories for Safety-Neuron
Users that are interested in Safety-Neuron are comparing it to the libraries listed below
Sorting:
- The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'☆24May 20, 2025Updated 9 months ago
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- Data and code for the paper: Finding Safety Neurons in Large Language Models☆22Jan 29, 2026Updated last month
- The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.☆59Feb 12, 2026Updated 3 weeks ago
- [ICML 2023] "NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations" by Yonggan …☆18Mar 10, 2024Updated 2 years ago
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".☆36Jul 10, 2025Updated 8 months ago
- ☆36Jun 13, 2025Updated 8 months ago
- ☆11Oct 15, 2024Updated last year
- DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.☆12May 29, 2023Updated 2 years ago
- REDSearch: A scalable, cost-efficient framework for long-horizon search agents. Features complex task synthesis, optimized mid-training, …☆51Feb 26, 2026Updated last week
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Dec 24, 2023Updated 2 years ago
- ☆12Sep 28, 2023Updated 2 years ago
- [ECCV 2024] Characterizing Robustness via Natural Input Gradients☆13Updated this week
- Video packaging platform - this will build a Docker with a web API that will let you upload, encrypt and serve videos as MPEG DASH files☆11Sep 6, 2020Updated 5 years ago
- ☆10Mar 20, 2023Updated 2 years ago
- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models☆10Feb 20, 2025Updated last year
- ☆12Oct 17, 2024Updated last year
- ☆11Mar 24, 2023Updated 2 years ago
- ☆12Jan 25, 2025Updated last year
- This is the official repo of the paper "Latent Guard: a Safety Framework for Text-to-image Generation"☆53Oct 24, 2024Updated last year
- SuperGS: Super-Resolution 3D Gaussian Splatting Enhanced by Variational Residual Features and Uncertainty-Augmented Learning☆11May 24, 2025Updated 9 months ago
- [ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.☆15Feb 1, 2025Updated last year
- RESAnything: Attribute Prompting for Arbitrary Referring Segmentation☆17Nov 28, 2025Updated 3 months ago
- ☆11May 27, 2020Updated 5 years ago
- [CVPR '23 Highlight] Official repository for the paper "Quantum Multi-Model Fitting".☆11Mar 7, 2025Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆30Aug 22, 2025Updated 6 months ago
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!☆15Apr 8, 2025Updated 11 months ago
- 一款功能强大的调研问卷系统;有多种题型可供选择,拖动即可生成,支持在线预览,报表查询等。支持题目之间相互跳转和显示。社区版已上线,部分功能正在更新中。。。敬请期待!!!☆10Oct 25, 2024Updated last year
- Chinese Mammography Database (CMMD dataset) Deep Learning Classification Pipeline☆15Mar 15, 2022Updated 3 years ago
- ☆13Oct 13, 2025Updated 4 months ago
- ☆11Apr 3, 2024Updated last year
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- ☆11Nov 9, 2023Updated 2 years ago
- Source code of BI-Mamba for cardiovascular disease detection from two-view chest X-rays☆14Dec 10, 2025Updated 2 months ago
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆24Oct 23, 2025Updated 4 months ago
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆35Jan 16, 2026Updated last month
- simple solution based on Gradient Boost and Random Forest, rank 24/3251 (top 1%) within 60 lines of python code☆14Jun 21, 2019Updated 6 years ago
- [NeurIPS 2023] Official PyTorch implementation for the paper "CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganog…☆11Sep 28, 2023Updated 2 years ago
- ☆10Jan 18, 2024Updated 2 years ago