Dtc7w3PQ / Response-AttackLinks

☆23

Alternatives and similar repositories for Response-Attack

Users that are interested in Response-Attack are comparing it to the libraries listed below

Sorting:

isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆152Updated 11 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆72Updated this week
ydyjya / LLM-IHS-Explanation
☆51Updated last year
Dtc7w3PQ / Visco-Attack
☆24Updated last month
isXinLiu / Awesome-MLLM-Safety
Accepted by IJCAI-24 Survey Track
☆215Updated last year
liuxuannan / Awesome-Multimodal-Jailbreak
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
☆225Updated 2 weeks ago
ydyjya / SafetyHeadAttribution
☆44Updated 3 months ago
liudaizong / Awesome-LVLM-Attack
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆383Updated 2 weeks ago
WangCheng0116 / Awesome-LRMs-Safety
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆70Updated 3 weeks ago
AI45Lab / VLSBench
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆50Updated 2 months ago
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆136Updated last month
salman-lui / x-teaming
☆38Updated 4 months ago
DSN-2024 / DSN
DSN jailbreak Attack & Evaluation Ensemble
☆10Updated 2 months ago
UCSC-VLAA / STAR-1
☆30Updated 5 months ago
AI45Lab / X-Boundary
The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usa…
☆35Updated 6 months ago
thu-ml / MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
☆166Updated 2 months ago
ASTRAL-Group / ASTRA
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆39Updated 2 months ago
SaFoLab-WISC / AdaShield
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆64Updated last year
Yuancheng-Xu / GenARM
Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"
☆16Updated 7 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆206Updated last week
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆171Updated 2 months ago
wbopan / safety-residual-space
☆18Updated 6 months ago
Purshow / Awesome-LVLM-Hallucination
☆49Updated 9 months ago
franciscoliu / MLLMU-Bench
[NAACL 2025 Main] Official Implementation of MLLMU-Bench
☆34Updated 6 months ago
jianghoucheng / AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
☆329Updated 2 months ago
NOVAglow646 / LLM-MLLM-paper-list
关于LLM和Multimodal LLM的paper list
☆47Updated this week
SaFoLab-WISC / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆75Updated 4 months ago
The-Martyr / Awesome-Modality-Priors-in-MLLMs
Latest Advances on Modality Priors in Multimodal Large Language Models
☆23Updated this week
EchoseChen / SPA-VL-RLHF
The reinforcement learning codes for dataset SPA-VL
☆36Updated last year
listen0425 / Safety-Layers
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆11Updated 4 months ago