THU-KEG / SafetyNeuronLinks
Data and code for the paper: Finding Safety Neurons in Large Language Models
☆18Updated last year
Alternatives and similar repositories for SafetyNeuron
Users that are interested in SafetyNeuron are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆65Updated last year
- ☆32Updated 9 months ago
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…