SALT-NLP / PopupAttackLinks
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆47Updated 11 months ago
Alternatives and similar repositories for PopupAttack
Users that are interested in PopupAttack are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast☆117Updated last year
- [NeurIPS 2025 Spotlight] ReasonFlux-Coder: Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆133Updated 2 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆115Updated 9 months ago
- ☆30Updated last year
- ☆22Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98Updated last year
- ☆50Updated 9 months ago
- ☆33Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86Updated 6 months ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆13Updated 9 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆89Updated 6 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆71Updated 6 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆84Updated last year
- ☆42Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆116Updated 6 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆172Updated last year
- ☆22Updated last year
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆30Updated 5 months ago
- ☆64Updated 5 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆47Updated 4 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆65Updated 10 months ago
- Codebase for Inference-Time Policy Adapters