Data and code for the paper: Finding Safety Neurons in Large Language Models
☆23Jan 29, 2026Updated last month
Alternatives and similar repositories for SafetyNeuron
Users that are interested in SafetyNeuron are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆30Apr 30, 2025Updated 10 months ago
- Accept by CVPR 2025 (highlight)☆24Jun 8, 2025Updated 9 months ago
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆17Mar 2, 2026Updated 3 weeks ago
- Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"☆15Aug 7, 2025Updated 7 months ago
- This is the official implementation for MA-LoT.☆19Aug 4, 2025Updated 7 months ago
- [EMNLP 2025] Circuit-Aware Editing Enables Generalizable Knowledge Learners☆19Nov 17, 2025Updated 4 months ago
- Implementaiton of "DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation" (accepted by NAACL2024 Findings)".☆29Feb 10, 2025Updated last year
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated last year
- Official implementation of Language Models as Compilers: Simulating the Execution Of Pseudocode Improves Algorithmic Reasoning in Languag…☆23Apr 8, 2024Updated last year
- Official code for the paper Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception. The code is based on t…☆19Aug 5, 2025Updated 7 months ago
- Code for the paper Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation (CVPR 2023).☆34May 26, 2023Updated 2 years ago
- Enhancing contextual understanding in large language models through contrastive decoding☆20May 3, 2024Updated last year
- EraseDiff: Erasing Data Influence in Diffusion Models☆14Nov 20, 2024Updated last year
- 记录了在三本软工两年来的课程资料,进击吧少年☆10Dec 10, 2022Updated 3 years ago
- howjul's notebook☆14Nov 15, 2024Updated last year
- Source code of BI-Mamba for cardiovascular disease detection from two-view chest X-rays☆14Dec 10, 2025Updated 3 months ago
- TuneTables is a tabular classifier that implements prompt tuning for frozen prior-fitted networks.☆23Mar 31, 2025Updated 11 months ago
- 此项目是我个人对MIT 6.5940 课程作业的答案,学习笔记和心得。☆15Mar 1, 2024Updated 2 years ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Xlore2.0 Code[BaiduExtractor, HudongExtractor, WikiExtractor, XloreData, XloreWeb]☆12Apr 5, 2017Updated 8 years ago
- ☆33Aug 28, 2024Updated last year
- ☆11Mar 24, 2023Updated 2 years ago
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆14Dec 16, 2024Updated last year
- [EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking☆12Aug 22, 2025Updated 7 months ago
- [BMVC2024] Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning☆14Feb 14, 2026Updated last month
- TOKEN-IMPORTANCE GUIDED DIRECT PREFERENCE OPTIMIZATION☆24Jan 26, 2026Updated last month
- ☆12Jan 10, 2023Updated 3 years ago
- [AAAI'25] SPRING: Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models☆26Sep 24, 2025Updated 5 months ago
- ☆12Jan 25, 2025Updated last year
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆25Oct 23, 2025Updated 5 months ago
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆15Feb 1, 2025Updated last year
- Papers about the trend of Entity Linking in recent years.☆11Sep 5, 2022Updated 3 years ago
- Codes for paper SoAy: A Service-oriented APIs Applying Framework of Large Language Models☆27Jul 14, 2025Updated 8 months ago
- ☆19May 14, 2025Updated 10 months ago
- ☆12Apr 25, 2024Updated last year
- ☆32Aug 9, 2024Updated last year
- Code for the paper: Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness☆12Oct 22, 2023Updated 2 years ago