ServiceNow / DoomArena
DoomArena is a Framework for Testing AI Agents Against Evolving Security Threats
☆19Updated this week
Alternatives and similar repositories for DoomArena:
Users that are interested in DoomArena are comparing it to the libraries listed below
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆180Updated last week
- A benchmark for LLMs on complicated tasks in the terminal☆30Updated this week
- ☆91Updated 2 months ago
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- ☆26Updated last month
- ☆37Updated 7 months ago
- AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…☆305Updated this week
- ☆133Updated 5 months ago
- Fluent student-teacher redteaming☆20Updated 9 months ago
- Collection of evals for Inspect AI☆117Updated this week
- ☆39Updated 2 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆103Updated last year
- ☆25Updated last year
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆24Updated 8 months ago
- A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM …☆25Updated last month
- ☆64Updated this week
- ☆15Updated 3 weeks ago
- ☆34Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆23Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆85Updated 6 months ago
- ☆54Updated 7 months ago
- ☆28Updated 3 months ago
- Discovering Data-driven Hypotheses in the Wild☆76Updated 5 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆54Updated 2 months ago
- ☆42Updated last year
- A library for efficient patching and automatic circuit discovery.☆63Updated this week
- ☆87Updated 9 months ago
- ☆11Updated 6 months ago
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆16Updated 5 months ago
- ☆53Updated last year