SALT-NLP / PopupAttackLinks
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆33Updated 6 months ago
Alternatives and similar repositories for PopupAttack
Users that are interested in PopupAttack are comparing it to the libraries listed below
Sorting:
- ☆19Updated 8 months ago
- ☆19Updated 8 months ago
- ☆30Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆99Updated last month
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆61Updated last month
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆79Updated 8 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 5 months ago
- 🚀 SWE-bench Goes Live!☆80Updated this week
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆91Updated 4 months ago
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆15Updated 5 months ago
- ☆40Updated 2 weeks ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 2 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆94Updated last year
- Training and Benchmarking LLMs for Code Preference.☆33Updated 7 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated last month
- ☆26Updated last year
- ☆20Updated last month
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- ☆41Updated 8 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 4 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆63Updated 8 months ago
- ☆37Updated last year
- ☆26Updated this week
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆50Updated 2 months ago
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Updated 9 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 7 months ago
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated 8 months ago
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆55Updated 3 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆77Updated 2 months ago