THU-BPM / MarkLLM
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 Demo)
☆334Updated this week
Alternatives and similar repositories for MarkLLM:
Users that are interested in MarkLLM are comparing it to the libraries listed below
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆324Updated 2 months ago
- ☆35Updated 6 months ago
- The lastest paper about detection of LLM-generated text and code☆247Updated last month
- ☆112Updated 5 months ago
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆22Updated last year
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.☆210Updated 3 weeks ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆87Updated 6 months ago
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆216Updated last year
- Code for watermarking language models☆75Updated 5 months ago
- ☆554Updated 11 months ago
- LLM Unlearning☆140Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆125Updated 2 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆35Updated last year
- A survey on harmful fine-tuning attack for large language model☆135Updated this week
- ☆74Updated 2 weeks ago
- S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models☆52Updated this week
- [NDSS'25 Poster] A collection of automated evaluators for assessing jailbreak attempts.☆110Updated this week
- Hide and Seek (HaS): A Framework for Prompt Privacy Protection☆33Updated last year
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆190Updated 7 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆117Updated 2 months ago
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆92Updated 3 weeks ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆117Updated 7 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆213Updated this week
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆173Updated 4 months ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆103Updated this week
- multi-bit language model watermarking (NAACL 24)☆11Updated 5 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆276Updated 11 months ago
- Papers and resources related to the security and privacy of LLMs 🤖☆478Updated 2 months ago
- Agent Security Bench (ASB)☆62Updated last week
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆92Updated last month