alohachen / Hide-and-SeekLinks

Hide and Seek (HaS): A Framework for Prompt Privacy Protection

☆48

Alternatives and similar repositories for Hide-and-Seek

Users that are interested in Hide-and-Seek are comparing it to the libraries listed below

Sorting:

thu-coai / ShieldLM
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆214Updated last year
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆260Updated 3 months ago
IS2Lab / S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆99Updated 2 weeks ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
CLUEbenchmark / SuperCLUE-Safety
SC-Safety: 中文大模型多轮对抗安全基准
☆146Updated last year
xinleihe / MGTBench
☆160Updated 9 months ago
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆280Updated 4 months ago
chawins / llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
☆539Updated 4 months ago
whitzard-ai / jade-db
"他山之石、可以攻玉"：复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB
☆468Updated last week
Allen-piexl / JailbreakZoo
☆153Updated last year
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆210Updated 8 months ago
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆152Updated 11 months ago
Ymm-cll / TrustAgent
☆66Updated 7 months ago
AI45Lab / ActorAttack
☆109Updated 9 months ago
AI45Lab / Flames
Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
☆60Updated last year
Lordog / R-Judge
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆91Updated 5 months ago
IAAR-Shanghai / SafeRAG
☆46Updated 7 months ago
alipay / private_llm
☆35Updated last year
agiresearch / ASB
Agent Security Bench (ASB)
☆137Updated 3 weeks ago
xingjunm / Awesome-Large-Model-Safety
Safety at Scale: A Comprehensive Survey of Large Model Safety
☆200Updated 8 months ago
CryptoAILab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆172Updated 7 months ago
YancyKahn / CoA
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆38Updated 9 months ago
STAIR-BUPT / STAIR-LLMGuardrails
☆12Updated last year
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆105Updated 9 months ago
safr-ai-lab / survey-llm
A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code
☆68Updated last year
FederatedAI / FATE-LLM
Federated Learning for LLMs.
☆235Updated last week
X-PLUG / CValues
面向中文大模型价值观的评估与对齐研究
☆542Updated 2 years ago
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆163Updated 7 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆94Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆41Updated last year