This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆61Sep 5, 2025Updated 6 months ago
Alternatives and similar repositories for Awesome-reasoning-safety
Users that are interested in Awesome-reasoning-safety are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks☆14Feb 6, 2024Updated 2 years ago
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- ☆14Feb 26, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆82Updated this week
- Official code for ICML 2024 paper, "Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models"☆19Jun 12, 2024Updated last year
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated last month
- CVPR 2025 - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning☆22Aug 28, 2025Updated 6 months ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆26Sep 10, 2024Updated last year
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- ☆10Jul 3, 2024Updated last year
- This is the repository that introduces research topics related to protecting intellectual property (IP) of AI from a data-centric perspec…