AI4Good24/PsySafe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AI4Good24/PsySafe)

AI4Good24 / PsySafe

☆53

Alternatives and similar repositories for PsySafe

Users that are interested in PsySafe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhangzaibin / AD-H
View on GitHub
☆15May 21, 2026Updated last month
AI45Lab / DEAN
View on GitHub
☆11Oct 25, 2024Updated last year
sail-sg / Agent-Smith
View on GitHub
[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
☆123Mar 26, 2024Updated 2 years ago
IranQin / MP5
View on GitHub
[CVPR2024] This is the official implement of MP5
☆105Jun 30, 2024Updated 2 years ago
Jometeorie / KnowledgeSpread
View on GitHub
☆40Oct 15, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AI45Lab / VLSBench
View on GitHub
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆62Jul 21, 2025Updated 11 months ago
IranQin / Awesome_World_Model_Papers
View on GitHub
[World-Model-Survey-2024] Paper list and projects for World Model
☆15Oct 31, 2024Updated last year
ChnQ / TracingLLM
View on GitHub
☆30May 22, 2024Updated 2 years ago
fzwark / Secure_LLM_System
View on GitHub
☆16Mar 9, 2025Updated last year
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆115Sep 27, 2024Updated last year
OpenSafetyLab / SALAD-BENCH
View on GitHub
【ACL 2024】 SALAD benchmark & MD-Judge
☆176Mar 8, 2025Updated last year
AI45Lab / DeepScan
View on GitHub
Diagnostic Framework for LLMs and MLLMs
☆39Mar 2, 2026Updated 4 months ago
xirui-li / MOSSBench
View on GitHub
An implementation for MLLM oversensitivity evaluation
☆18Nov 16, 2024Updated last year
pasquini-dario / LLM_NeuralExec
View on GitHub
Code to generate NeuralExecs (prompt injection for LLMs)
☆27Oct 5, 2025Updated 9 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
rucnyz / LeakAgent
View on GitHub
☆29Aug 31, 2025Updated 10 months ago
song2yu / SIBench-VSR
View on GitHub
This is a project on visual spatial reasoning tasks-SIBench
☆27Jan 12, 2026Updated 6 months ago
AI45Lab / DeepSafe
View on GitHub
All-in-One Safety Evaluation Framwork
☆51Updated this week
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
ChnQ / MI-Peaks
View on GitHub
☆68Jul 14, 2025Updated last year
hengzzzhou / ReSo
View on GitHub
☆25Jan 29, 2026Updated 5 months ago
kodenii / Responsible-Robotic-Manipulation
View on GitHub
Responsible Robotic Manipulation
☆16Aug 31, 2025Updated 10 months ago
song2yu / world-model-vla
View on GitHub
World Model & VLA Survey - Interactive Research Page
☆17May 26, 2026Updated last month
NASP-THU / CSEBenchmark
View on GitHub
The official repository of the paper "The Digital Cybersecurity Expert: How Far Have We Come?" presented in IEEE S&P 2025
☆24May 21, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Eyr3 / PrivacyAsst
View on GitHub
PrivacyAsst: Safeguarding User Privacy in Tool-Using Large Language Model Agents (TDSC 2024)
☆19Mar 29, 2024Updated 2 years ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
compsec-snu / pfi
View on GitHub
PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents
☆30Mar 26, 2025Updated last year
ShaoShuai0605 / Misevolution
View on GitHub
Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
☆90Jun 2, 2026Updated last month
inspire-group / tta_risk
View on GitHub
☆15Jun 6, 2023Updated 3 years ago
Zsbyqx20 / AgentHazard
View on GitHub
Mobile GUI Agents under Real-world Threats: Are We There Yet?
☆17May 18, 2026Updated 2 months ago
SproutNan / AI-Safety_SCAV
View on GitHub
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆49Oct 13, 2025Updated 9 months ago
T1aNS1R / Evil-Geniuses
View on GitHub
☆71Feb 4, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Dtc7w3PQ / Visco-Attack
View on GitHub
Official implementation of Visco-Attack (EMNLP 2025 Main). An open-source one-click reproduction script is also provided.
☆30Apr 11, 2026Updated 3 months ago
vv19 / rendiff
View on GitHub
☆28Aug 6, 2024Updated last year
hengzzzhou / FigForge
View on GitHub
AI-powered scientific figure generator using LLM analysis and nano banana for publication-quality visualizations
☆19Feb 7, 2026Updated 5 months ago
yjyddq / RiOSWorld
View on GitHub
[NeurIPS 2025] Official repository of RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
☆123Dec 2, 2025Updated 7 months ago
RockyHHH / Safety-Evaluating
View on GitHub
本文提出了一个基于“文心一言”的中国LLMs的安全评估基准，其中包括8种典型的安全场景和6种指令攻击类型。此外，本文还提出了安全评估的框架和过程，利用手动编写和收集开源数据的测试Prompts，以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。
☆34Sep 1, 2023Updated 2 years ago
ShiJiawenwen / JudgeDeceiver
View on GitHub
[CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge
☆41Sep 17, 2025Updated 10 months ago
AI45Lab / REEF
View on GitHub
The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…
☆79Jan 16, 2025Updated last year