Data and code for the paper: Finding Safety Neurons in Large Language Models
☆21Jan 29, 2026Updated last month
Alternatives and similar repositories for SafetyNeuron
Users that are interested in SafetyNeuron are comparing it to the libraries listed below
Sorting:
- DICE: Detecting In-distribution Data Contamination with LLM's Internal State☆11Sep 21, 2024Updated last year
- Accept by CVPR 2025 (highlight)☆22Jun 8, 2025Updated 8 months ago
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆28Apr 30, 2025Updated 10 months ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated 11 months ago
- TuneTables is a tabular classifier that implements prompt tuning for frozen prior-fitted networks.☆23Mar 31, 2025Updated 11 months ago
- Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"☆15Aug 7, 2025Updated 6 months ago
- Implementaiton of "DiLM: Distilling Dataset into Language Model for Text-level Dataset Distillation" (accepted by NAACL2024 Findings)".☆28Feb 10, 2025Updated last year
- Codes for paper SoAy: A Service-oriented APIs Applying Framework of Large Language Models☆27Jul 14, 2025Updated 7 months ago
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- Code for the paper Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation (CVPR 2023).☆34May 26, 2023Updated 2 years ago
- Concurrency library☆17Oct 13, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆32Jun 13, 2024Updated last year
- ☆36Apr 30, 2024Updated last year
- Models for packages and the resources they contain.☆14Mar 10, 2024Updated last year
- ☆12Oct 29, 2023Updated 2 years ago
- ☆10Apr 7, 2024Updated last year
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- Scaffold Prompting to promote LMMs☆46Dec 16, 2024Updated last year
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- Code for the paper: Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness☆12Oct 22, 2023Updated 2 years ago
- ☆12Jan 10, 2023Updated 3 years ago
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 8 months ago
- This is the official implementation for MA-LoT.☆19Aug 4, 2025Updated 6 months ago
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆11Nov 27, 2022Updated 3 years ago
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated last month
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated last year
- Implementing LRP (Layer-wise Relevance Propagation) for a sequence-to-sequence model with GRU layers.☆12Sep 8, 2023Updated 2 years ago
- A curated list of 150+ papers and resources on Agentic Security. Based on the survey covering the transition from passive LLMs to autonom…☆28Dec 6, 2025Updated 2 months ago
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Oct 1, 2024Updated last year
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆15Sep 15, 2025Updated 5 months ago
- A Mechanistic‑Interpretability study that finds the structural dynamics of Large Language Models under fine‑tuning.☆16May 30, 2025Updated 9 months ago
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated 2 weeks ago
- [GSI 2023] Learning Lagrangian Fluid Mechanics with E(3)-Equivariant GNNs☆15Jun 3, 2024Updated last year
- Official Implementation of "The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thou…☆14Jul 2, 2025Updated 8 months ago
- Package to estimate the grouping loss of a classifier, based on the paper "Beyond calibration: estimating the grouping loss of modern neu…☆11Dec 14, 2024Updated last year