Dtc7w3PQ / Response-AttackLinks
Official implementation of “Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models” (AAAI 2026).
☆31Updated last month
Alternatives and similar repositories for Response-Attack
Users that are interested in Response-Attack are comparing it to the libraries listed below
Sorting:
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆81Updated this week
- A Diagnostic Guardrail Framework for AI Agent Safety and Security☆316Updated this week
- ☆56Updated last year
- Accepted by ECCV 2024☆185Updated last year
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆28Updated 5 months ago
- A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models☆302Updated 3 weeks ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆485Updated last week
- Accepted by IJCAI-24 Survey Track☆230Updated last year
- Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)☆94Updated this week
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆21Updated 9 months ago
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆88Updated 5 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Updated 7 months ago
- [ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.☆125Updated 5 months ago
- ☆64Updated 8 months ago
- A survey on harmful fine-tuning attack for large language model☆232Updated last month
- ☆55Updated 8 months ago
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆393Updated 3 months ago
- ☆174Updated 3 months ago
- ☆121Updated last year
- [EMNLP 2024 Findings] Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information☆13Updated last year
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆225Updated this week
- Code and data for paper "Can Watermarked LLMs be Identified by Users via Crafted Prompts?" Accepted by ICLR 2025 (Spotlight)☆28Updated last year
- DSN jailbreak Attack & Evaluation Ensemble☆16Updated last month
- ☆57Updated 8 months ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆53Updated 6 months ago
- Official repository for "CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation"☆68Updated last month
- 🚀 A curated list of awesome resources focusing on Context Compression techniques for Large Language Models(LLMs).☆57Updated 3 weeks ago
- Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"☆88Updated 11 months ago
- The reinforcement learning codes for dataset SPA-VL☆44Updated last year
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆47Updated 3 months ago