tml-epfl/os-harm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tml-epfl/os-harm)

tml-epfl / os-harm

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [NeurIPS 2025 Spotlight]

☆69

Alternatives and similar repositories for os-harm

Users that are interested in os-harm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
OSU-NLP-Group / RedTeamCUA
View on GitHub
[ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
☆57Feb 9, 2026Updated 5 months ago
jonasrauber / linear-region-attack
View on GitHub
A powerful white-box adversarial attack that exploits knowledge about the geometry of neural networks to find minimal adversarial perturb…
☆12Aug 5, 2020Updated 5 years ago
THUDM / ComputerRL
View on GitHub
☆40Nov 7, 2025Updated 8 months ago
tim-hua-01 / steering-eval-awareness-public
View on GitHub
☆17Mar 16, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
valentyn1boreiko / SVCEs_code
View on GitHub
☆13Jun 23, 2022Updated 4 years ago
UKGovernmentBEIS / control-arena
View on GitHub
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆210Updated this week
UKGovernmentBEIS / aisi-sandboxing
View on GitHub
The open-source AISI toolkit for sandboxing agentic evaluations
☆25Aug 7, 2025Updated 11 months ago
guardagent / code
View on GitHub
☆47Dec 9, 2025Updated 7 months ago
max-andr / provable-robustness-max-linear-regions
View on GitHub
Provable Robustness of ReLU networks via Maximization of Linear Regions [AISTATS 2019]
☆31Jul 15, 2020Updated 6 years ago
shuita2333 / AutoDoS
View on GitHub
Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
☆25Sep 1, 2025Updated 10 months ago
brucewlee / mini-control-arena
View on GitHub
AI Control evaluation library. Built natively on Inspect AI.
☆17Feb 25, 2026Updated 4 months ago
sjpark5800 / LA-DETR
View on GitHub
[WACV 2026] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
☆14Sep 18, 2025Updated 10 months ago
tml-epfl / sharpness-vs-generalization
View on GitHub
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆44Sep 11, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
locuslab / intermediate_robustness
View on GitHub
☆15Dec 7, 2021Updated 4 years ago
locuslab / robust_union
View on GitHub
[ICML'20] Multi Steepest Descent (MSD) for robustness against the union of multiple perturbation models.
☆25Jul 25, 2024Updated last year
thu-coai / Agent-SafetyBench
View on GitHub
☆149Aug 11, 2025Updated 11 months ago
ugonfor / DGQ
View on GitHub
[ICLR 2025] DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
☆19Mar 25, 2025Updated last year
microsoft / llmail-inject-challenge
View on GitHub
Code for the API, workload execution, and agents underlying the LLMail-Inject Adpative Prompt Injection Challenge
☆25Apr 9, 2026Updated 3 months ago
METR / hawk
View on GitHub
Run Inspect AI evals in the cloud
☆32Updated this week
CLAIRE-Labo / no-representation-no-trust
View on GitHub
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (M…
☆33Nov 20, 2024Updated last year
USSLab / TPatch
View on GitHub
[USENIX'23] TPatch: A Triggered Physical Adversarial Patch
☆25Aug 8, 2023Updated 2 years ago
AI-secure / RedCode
View on GitHub
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆85Apr 24, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
fra31 / rlhf-trojan-competition-submission
View on GitHub
☆19Feb 25, 2024Updated 2 years ago
rgreenblatt / control-evaluations
View on GitHub
☆25May 25, 2024Updated 2 years ago
ErxinYu / CoSafe-Dataset
View on GitHub
☆13Nov 12, 2024Updated last year
trustworthy-machine-learning / trustworthy-machine-learning.github.io
View on GitHub
A School for All Seasons on Trustworthy Machine Learning
☆12Jun 30, 2021Updated 5 years ago
SalesforceAIResearch / CoAct-1
View on GitHub
CoAct-1: Computer-using Agents with Coding as Actions
☆27Jun 2, 2026Updated last month
cassidylaidlaw / perceptual-advex
View on GitHub
Code and data for the ICLR 2021 paper "Perceptual Adversarial Robustness: Defense Against Unseen Threat Models".
☆56Jan 18, 2022Updated 4 years ago
google-deepmind / agent_debugger
View on GitHub
Causal Analysis of Agent Behavior for AI Safety
☆21Jun 27, 2023Updated 3 years ago
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆670Jun 2, 2026Updated last month
tml-epfl / understanding-fast-adv-training
View on GitHub
Understanding and Improving Fast Adversarial Training [NeurIPS 2020]
☆96Sep 23, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
qiancheng0 / EscapeBench
View on GitHub
This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
☆18Dec 19, 2024Updated last year
GAIR-NLP / OPO
View on GitHub
☆50Mar 2, 2024Updated 2 years ago
halfrot / ALaRM
View on GitHub
[ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"
☆25Mar 28, 2024Updated 2 years ago
LTS4 / neural-anisotropy-directions
View on GitHub
Source code for "Neural Anisotropy Directions"
☆16Nov 17, 2020Updated 5 years ago
tml-epfl / sam-low-rank-features
View on GitHub
Sharpness-Aware Minimization Leads to Low-Rank Features [NeurIPS 2023]
☆29Sep 22, 2023Updated 2 years ago
DSLwDE / DSLwDE
View on GitHub
☆14Jul 25, 2025Updated 11 months ago
LTS4 / hold-me-tight
View on GitHub
Source code of "Hold me tight! Influence of discriminative features on deep network boundaries"
☆21Dec 10, 2021Updated 4 years ago