alohachen / Hide-and-Seek
Hide and Seek (HaS): A Framework for Prompt Privacy Protection
☆24Updated last year
Related projects: ⓘ
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors☆134Updated 2 months ago
- Federated Learning for LLMs.☆144Updated last month
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety.☆141Updated 2 months ago
- 复旦白泽大模型安全基准测试集(2024年夏季版)☆21Updated last month
- Shepherd: A foundational framework enabling federated instruction tuning for large language models☆198Updated last year
- A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code☆58Updated 3 months ago
- [ICML 2024] TrustLLM: Trustworthiness in Large Language Models☆432Updated 2 weeks ago
- A collection of automated evaluators for assessing jailbreak attempts.☆55Updated 2 months ago
- ☆28Updated 8 months ago
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.☆246Updated last month
- LLM Unlearning☆112Updated 11 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆65Updated last month
- ☆63Updated 10 months ago
- ☆37Updated 3 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents☆57Updated last month
- ☆61Updated 2 years ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆30Updated 4 months ago
- ☆22Updated 7 months ago
- Papers and resources related to the security and privacy of LLMs 🤖☆393Updated last week
- 【ACL 2024】 SALAD benchmark & MD-Judge☆81Updated this week
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆219Updated 6 months ago
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆79Updated last month
- [ICLR'24 Spotlight] DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer☆28Updated 3 months ago
- A toolkit to assess data privacy in LLMs (under development)☆36Updated 2 weeks ago
- Accepted by ECCV 2024☆59Updated 2 months ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆67Updated 2 weeks ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆20Updated 11 months ago
- Accepted by IJCAI-24 Survey Track☆117Updated 3 weeks ago
- ☆12Updated last year
- Code and data for our paper "Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark"…☆47Updated last year