SALT-NLP / PopupAttackLinks
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆32Updated 5 months ago
Alternatives and similar repositories for PopupAttack
Users that are interested in PopupAttack are comparing it to the libraries listed below
Sorting:
- ☆19Updated 7 months ago
- ☆18Updated 7 months ago
- ☆30Updated 11 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆93Updated last year
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆78Updated 7 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 3 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 4 months ago
- ☆19Updated 3 weeks ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆60Updated 2 weeks ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆88Updated 3 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆15Updated 4 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆22Updated 5 months ago
- Codebase for Inference-Time Policy Adapters☆23Updated last year
- Resources for the Enigmata Project.☆32Updated this week
- ☆59Updated 9 months ago
- ☆35Updated 3 months ago
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆13Updated 2 weeks ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆36Updated last week
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆16Updated 3 months ago
- ☆37Updated last year
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 6 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆57Updated 7 months ago
- ☆33Updated last year
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆45Updated 6 months ago
- ☆22Updated 3 months ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Updated 10 months ago