THU-BPM / MarkLLM
MarkLLM: An Open-Source Toolkit for LLM Watermarking.(EMNLP 2024 Demo)
☆278Updated this week
Related projects ⓘ
Alternatives and complementary repositories for MarkLLM
- ☆30Updated 2 months ago
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆285Updated 4 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆123Updated this week
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆156Updated last month
- ☆86Updated 2 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆103Updated 3 weeks ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆154Updated 4 months ago
- The lastest paper about detection of LLM-generated text and code☆215Updated this week
- Code for watermarking language models☆72Updated 2 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆76Updated 3 months ago
- S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models☆41Updated last week
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆18Updated 11 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆242Updated 2 weeks ago
- A collection of automated evaluators for assessing jailbreak attempts.☆71Updated 4 months ago
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆120Updated 6 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆166Updated 3 weeks ago
- Repository for Towards Codable Watermarking for Large Language Models☆29Updated last year
- A survey on harmful fine-tuning attack for large language model☆69Updated this week
- ☆31Updated 4 months ago
- LLM Unlearning☆123Updated last year
- Accepted by IJCAI-24 Survey Track