THU-BPM / MarkLLM
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 Demo)
☆292Updated this week
Related projects ⓘ
Alternatives and complementary repositories for MarkLLM
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆293Updated this week
- ☆30Updated 3 months ago
- ☆91Updated 2 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆133Updated last week
- A collection of automated evaluators for assessing jailbreak attempts.☆75Updated 4 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆245Updated 3 weeks ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆106Updated last month
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆158Updated last month
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆76Updated 3 months ago
- ☆31Updated 5 months ago
- A survey on harmful fine-tuning attack for large language model☆80Updated last week
- The lastest paper about detection of LLM-generated text and code☆216Updated last week
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆241Updated 8 months ago
- ☆53Updated 3 weeks ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆157Updated 4 months ago
- Accepted by IJCAI-24 Survey Track☆159Updated 2 months ago
- Accepted by ECCV 2024☆74Updated last month
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆174Updated last month
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆61Updated last month
- Code for watermarking language models☆72Updated 2 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆99Updated 4 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆29Updated last year
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆18Updated last year
- S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models☆42Updated 2 weeks ago
- A resource repository for machine unlearning in large language models☆218Updated last week
- ☆153Updated 11 months ago
- Source code of paper "An Unforgeable Publicly Verifiable Watermark for Large Language Models" accepted by ICLR 2024☆28Updated 5 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆122Updated 6 months ago
- LLM Unlearning☆125Updated last year
- ☆12Updated 3 weeks ago