SALT-NLP / PopupAttack
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆29Updated 4 months ago
Alternatives and similar repositories for PopupAttack:
Users that are interested in PopupAttack are comparing it to the libraries listed below
- ☆18Updated 6 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- ☆28Updated 10 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆77Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆81Updated last month
- ☆19Updated 7 months ago
- ☆15Updated this week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 3 weeks ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆21Updated 4 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆74Updated last week
- ☆24Updated 6 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆84Updated 2 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆101Updated 2 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆11Updated last year
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆15Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆92Updated 11 months ago
- ☆16Updated last month
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 3 months ago
- ☆59Updated 8 months ago
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆47Updated last year
- An Illusion of Progress? Assessing the Current State of Web Agents☆42Updated this week
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆31Updated last month
- ☆46Updated 2 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- ☆18Updated last week
- Codebase for Inference-Time Policy Adapters☆23Updated last year