scaleapi / browser-art
☆25Updated last month
Alternatives and similar repositories for browser-art:
Users that are interested in browser-art are comparing it to the libraries listed below
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆140Updated last year
- [ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents☆80Updated 2 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆73Updated 4 months ago
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆66Updated last week
- Improving Alignment and Robustness with Circuit Breakers☆196Updated 6 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆129Updated 9 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 11 months ago
- ☆132Updated 5 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆53Updated last month
- Weak-to-Strong Jailbreaking on Large Language Models☆73Updated last year
- ☆26Updated 9 months ago
- ☆169Updated last year
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆181Updated this week
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆84Updated 5 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆29Updated 3 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- Code to break Llama Guard☆31Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆208Updated 6 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆151Updated 11 months ago
- ☆22Updated 6 months ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆51Updated 6 months ago
- Dataset for the Tensor Trust project☆39Updated last year
- AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.☆117Updated 2 weeks ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆74Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆67Updated 2 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆60Updated 3 months ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆192Updated 8 months ago
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆74Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆110Updated 11 months ago