zihao-ai / EARBench
Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents
β17Updated 4 months ago
Alternatives and similar repositories for EARBench:
Users that are interested in EARBench are comparing it to the libraries listed below
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β50Updated 10 months ago
- π₯π₯π₯Breaking long thought processes of o1-like LLMs, such as DeepSeek-R1, QwQβ28Updated last month
- β47Updated 3 months ago
- β21Updated 7 months ago
- β44Updated 8 months ago
- Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks"β14Updated 4 months ago
- Accepted by ECCV 2024β122Updated 6 months ago
- Comprehensive Assessment of Trustworthiness in Multimodal Foundation Modelsβ18Updated last month
- A package that achieves 95%+ transfer attack success rate against GPT-4β19Updated 5 months ago
- [MM'23 Oral] "Text-to-image diffusion models can be easily backdoored through multimodal data poisoning"β28Updated last month
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ133Updated last month
- [ECCV2024] Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectorβ¦β24Updated 4 months ago
- A survey on harmful fine-tuning attack for large language modelβ157Updated last week
- A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Modelsβ165Updated this week
- Implementation of BadCLIP https://arxiv.org/pdf/2311.16194.pdfβ20Updated last year
- β69Updated 8 months ago
- official PyTorch implement of Towards Adversarial Attack on Vision-Language Pre-training Modelsβ59Updated 2 years ago
- [ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Imagesβ33Updated last year
- β42Updated 4 months ago
- Watermarking LLM papers up-to-dateβ13Updated last year
- This is the code repository of our submission: Understanding the Dark Side of LLMsβ Intrinsic Self-Correction.β56Updated 4 months ago
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β268Updated last week
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"β23Updated last year
- A toolbox for backdoor attacks.β21Updated 2 years ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and furβ¦β53Updated 9 months ago
- β39Updated 10 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as β¦β63Updated this week
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)β33Updated last year
- Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)β12Updated last year
- β9Updated 3 years ago