[ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
☆31Apr 30, 2025Updated last year
Alternatives and similar repositories for Safety-Neuron
Users that are interested in Safety-Neuron are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'☆24May 20, 2025Updated 11 months ago
- The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.☆58Feb 12, 2026Updated 2 months ago
- Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".☆36Jul 10, 2025Updated 10 months ago
- [ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.☆15Feb 1, 2025Updated last year
- [CVPR '23 Highlight] Official repository for the paper "Quantum Multi-Model Fitting".☆11Mar 7, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆12Jun 5, 2024Updated last year
- [ICML 2023] "NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations" by Yonggan …☆18Mar 10, 2024Updated 2 years ago
- [EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆39Aug 20, 2025Updated 8 months ago
- In-Context Reinforcement Learning for Tool Use in Large Language Models☆46Mar 26, 2026Updated last month
- ☆12Sep 28, 2023Updated 2 years ago
- ☆11Oct 15, 2024Updated last year