Ymm-cll/TrustAgent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Ymm-cll/TrustAgent)

Ymm-cll / TrustAgent

☆99

Alternatives and similar repositories for TrustAgent

Users that are interested in TrustAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wslong20 / G-safeguard
View on GitHub
☆42Jun 28, 2025Updated last year
CGCL-codes / Gen-AF
View on GitHub
The implementation of our IEEE S&P 2024 paper "Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples".
☆11Jun 28, 2024Updated 2 years ago
LetterLiGo / Agent-WebCloak
View on GitHub
[IEEE S&P'26] WebCloak: Characterizing and Mitigating the Threats of LLM-Driven Web Agents as Intelligent Scrapers
☆28Jan 31, 2026Updated 5 months ago
kagnlp / Awesome-Agentic-Security
View on GitHub
A curated list of 260+ papers and resources on Agentic Security. Based on the survey covering the transition from passive LLMs to autonom…
☆50Jun 14, 2026Updated last month
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
XuanChen-xc / RLbreaker
View on GitHub
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆18Oct 22, 2024Updated last year
uiuc-kang-lab / InjecAgent
View on GitHub
☆153Jul 2, 2024Updated 2 years ago
yanweiyue / masrouter
View on GitHub
☆131Oct 29, 2025Updated 8 months ago
LucasFenaux / PILLAR-ESPN
View on GitHub
Code for the paper: Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions
☆12Mar 13, 2024Updated 2 years ago
wbopan / safety-residual-space
View on GitHub
Multi-dimensional analysis of orthogonal safety directions in LLM alignment
☆23Jun 12, 2026Updated last month
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
xingjunm / Awesome-Large-Model-Safety
View on GitHub
Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety
☆276Apr 12, 2026Updated 3 months ago
brandeis-machine-learning / influence-fairness
View on GitHub
Code for ICML 2022 paper: Achieving Fairness at No Utility Cost via Data Reweighing with Influence
☆11Aug 3, 2022Updated 3 years ago
chuhac / Reasoning-to-Defend
View on GitHub
[EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
☆12Aug 22, 2025Updated 11 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
CryptoAILab / Awesome-LM-SSP
View on GitHub
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆2,020Jun 17, 2026Updated last month
ethz-spylab / jailbreak-tax
View on GitHub
☆24Feb 17, 2026Updated 5 months ago
meng-wenlong / LMSanitator
View on GitHub
☆29Aug 21, 2023Updated 2 years ago
xjzzzzzzzz / MCPSafety
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
ybwang119 / Awesome-reasoning-safety
View on GitHub
This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆67Sep 5, 2025Updated 10 months ago
tianshuocong / SSLGuard
View on GitHub
[CCS'22] SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders
☆18Jul 12, 2022Updated 4 years ago
AI-secure / TextGuard
View on GitHub
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
☆15Nov 7, 2023Updated 2 years ago
DanielLin97 / FACT-AUDIT
View on GitHub
An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models
☆18Feb 27, 2025Updated last year
Bob-cheng / DepthModelHardening
View on GitHub
Official PyTorch implementation of our paper "Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World A…
☆10Feb 8, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
guardagent / code
View on GitHub
☆47Dec 9, 2025Updated 7 months ago
xxiqiao / TROJail
View on GitHub
Official implementation of "TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards"
☆31Updated this week
wearetyomsmnv / Awesome-LLM-agent-Security
View on GitHub
All about llm-agents security,attack,vulnerabilities and how to do them for cybersecurity.
☆54Jul 11, 2026Updated 2 weeks ago
xunguangwang / SoK4JailbreakGuardrails
View on GitHub
[S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models
☆44Dec 17, 2025Updated 7 months ago
NicerWang / ToolCommander
View on GitHub
[NAACL 2025 Main] Official implementation of "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection"…
☆22Jun 11, 2025Updated last year
jiaxiaojunQAQ / SkillJect
View on GitHub
SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
☆73Jun 11, 2026Updated last month
salman-lui / x-teaming
View on GitHub
☆67May 21, 2025Updated last year
compsec-snu / pfi
View on GitHub
PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
☆31Mar 26, 2025Updated last year
jiaxiaojunQAQ / I-GCG
View on GitHub
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
☆146Apr 7, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
reds-lab / BEEAR
View on GitHub
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆23Jul 3, 2024Updated 2 years ago
liuchengwucn / Safe
View on GitHub
(ACL 2025 Main) Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification - Offici…
☆21Dec 26, 2025Updated 7 months ago
liudaizong / Awesome-LVLM-Attack
View on GitHub
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆566Jul 15, 2026Updated last week
Alpha-Yang / Char-adv-using-LID-torch
View on GitHub
There are my Pytorch codes for charactering adversarial subspace using local intrinsic dimensionality.
☆13Apr 26, 2022Updated 4 years ago
facebookresearch / ai-agent-privacy
View on GitHub
Dataset and evaluation benchmark for Privacy Leakage Evaluation of Autonomous Web Agents
☆45Apr 18, 2026Updated 3 months ago
AI-secure / AgentPoison
View on GitHub
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆231Jun 17, 2026Updated last month
Eyr3 / PrivacyAsst
View on GitHub
PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents (TDSC 2024)
☆19Mar 29, 2024Updated 2 years ago