sani903/OpenAgentSafety

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sani903/OpenAgentSafety)

sani903 / OpenAgentSafety

A Framework for Evaluating AI Agent Safety in Realistic Environments

☆38

Alternatives and similar repositories for OpenAgentSafety

Users that are interested in OpenAgentSafety are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xsddys / TRACE
View on GitHub
TRACE, a framework for turn-aware credit assignment for multi-turn jailbreak optimization
☆19Jun 22, 2026Updated 3 weeks ago
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
secure-software-engineering / SPDS-experiments
View on GitHub
☆11Oct 10, 2018Updated 7 years ago
lalalamdbf / PLSE_IDRR
View on GitHub
The Code for the EMNLP 2023 main conference paper "Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition…
☆13Dec 10, 2023Updated 2 years ago
jianshuod / SafeSearch
View on GitHub
[ICML 2026] Official implementations of ``SafeSearch: Automated Red-Teaming of LLM-Based Search Agents''
☆19Mar 25, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
EvanZhuang / vector-icl
View on GitHub
Official implementation of Vector-ICL: In-context Learning with Continuous Vector Representations (ICLR 2025)
☆24Jun 2, 2025Updated last year
salman-lui / x-teaming
View on GitHub
☆68May 21, 2025Updated last year
snu-mllab / Bayesian-Red-Teaming
View on GitHub
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Jul 9, 2023Updated 3 years ago
AI45Lab / MAGIC
View on GitHub
Code for paper "MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM safety"
☆51May 11, 2026Updated 2 months ago
asahala / BabyLemmatizer
View on GitHub
State-of-the-art neural tagger and lemmatizer for ancient languages
☆15Mar 26, 2026Updated 3 months ago
lihongcs / LLM_Inception
View on GitHub
[ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".
☆13Jan 25, 2025Updated last year
AI45Lab / X-Boundary
View on GitHub
[EMNLP 2025] The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Com…
☆41Nov 24, 2025Updated 7 months ago
guardagent / code
View on GitHub
☆47Dec 9, 2025Updated 7 months ago
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
agiresearch / ASB
View on GitHub
Agent Security Bench (ASB)
☆271Apr 16, 2026Updated 3 months ago
JackShDr / InfluentialRS
View on GitHub
Implementations of Influential Recommender System
☆12Oct 29, 2024Updated last year
KID-22 / Source-Bias
View on GitHub
Code for "Neural Retrievers are Biased Towards LLM-Generated Content"
☆14Oct 18, 2024Updated last year
parameterlab / leaky_thoughts
View on GitHub
Source code of "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" EMNLP 2025
☆17Jan 12, 2026Updated 6 months ago
shuita2333 / AutoDoS
View on GitHub
Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
☆25Sep 1, 2025Updated 10 months ago
saferlhf-v / saferlhf-v
View on GitHub
☆23Jun 16, 2025Updated last year
Zsbyqx20 / AgentHazard
View on GitHub
Mobile GUI Agents under Real-world Threats: Are We There Yet?
☆17May 18, 2026Updated 2 months ago
hanningzhang / prm
View on GitHub
☆17Nov 3, 2024Updated last year
Tencent-Hunyuan / DisCa
View on GitHub
DisCa: Accelerating Video Diffusion Transformers with Distillation-Compatible Learnable Feature Caching
☆24Apr 15, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
amuta / DDPG-MountainCarContinuous-v0
View on GitHub
Solving the OpenAI Gym (MountainCarContinuous-v0) with DDPG
☆21Jan 23, 2023Updated 3 years ago
HKUST-KnowComp / IntentionQA
View on GitHub
Code and data for the paper: IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Large Language Models …
☆12Apr 27, 2024Updated 2 years ago
VITA-Group / Sparsity-Win-Robust-Generalization
View on GitHub
[ICLR 2022] "Sparsity Winning Twice: Better Robust Generalization from More Efficient Training" by Tianlong Chen*, Zhenyu Zhang*, Pengjun…
☆40Mar 20, 2022Updated 4 years ago
VirtualBoBs / QEMUSLNetFuzz
View on GitHub
Stateless Network Fuzzer for QEMU (Targeting SLiRP)
☆17Oct 19, 2020Updated 5 years ago
HeadyZhang / agent-audit
View on GitHub
Static security scanner for LLM agents — prompt injection, MCP config auditing, taint analysis. 51 rules mapped to OWASP Agentic Top 10 (…
☆199Jul 4, 2026Updated 2 weeks ago
lonePatient / label_smoothing_pytorch
View on GitHub
pytorch implement of Label Smoothing
☆32Dec 16, 2019Updated 6 years ago
brucewlee / self-incrimination
View on GitHub
Code used for "Training Agents to Self-Report Misbehavior"
☆18Feb 27, 2026Updated 4 months ago
thu-coai / LongSafety
View on GitHub
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
☆16Jun 18, 2025Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆81Mar 30, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
SaFo-Lab / DynAuditClaw
View on GitHub
DynAuditClaw — A security audit skill that dynamically discovers your OpenClaw agent's real configuration, designs targeted attack scenar…
☆15Apr 6, 2026Updated 3 months ago
OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
SchwinnL / LLM_Embedding_Attack
View on GitHub
Code to conduct an embedding attack on LLMs
☆33Jan 10, 2025Updated last year
SalesforceAIResearch / indict_code_gen
View on GitHub
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
☆15Jun 2, 2026Updated last month
CommissarSilver / CVT
View on GitHub
This repository contains the replication package of our paper "Assessing the Security of GitHub Copilot’s Generated Code - A Targeted Rep…
☆10Nov 16, 2023Updated 2 years ago
ProofAgent-ai / proofagent-harness
View on GitHub
Open-source test harness for AI agents. Stress-test production agents with adversarial multi-turn scenarios in CI
☆16Jul 13, 2026Updated last week
IBM / model-sanitization
View on GitHub
Codes for reproducing the results of the paper "Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness" published at IC…
☆27Apr 29, 2020Updated 6 years ago