xinleihe / MGTBench
☆131Updated last month
Related projects: ⓘ
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.☆187Updated 2 weeks ago
- A collection of automated evaluators for assessing jailbreak attempts.☆55Updated 2 months ago
- The lastest paper about detection of LLM-generated text and code☆195Updated last week
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents☆57Updated last month
- ☆31Updated 4 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety.☆141Updated 2 months ago
- Awesome LLM Jailbreak academic papers☆61Updated 10 months ago
- ☆37Updated 4 months ago
- ☆94Updated 8 months ago
- ☆64Updated 2 weeks ago
- LLMDet is a text detection tool that can identify which generated sources the text came from (e.g. large language model or human-write).☆48Updated 3 months ago
- ☆143Updated 9 months ago
- UP-TO-DATE LLM Watermark paper. 🔥🔥🔥☆253Updated 3 months ago
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…☆131Updated 10 months ago
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks☆17Updated 10 months ago
- A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code☆58Updated 3 months ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆67Updated 2 weeks ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆109Updated 7 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆81Updated this week
- SC-Safety: 中文大模型多轮对抗安全基准☆94Updated 6 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆65Updated last month
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆20Updated 11 months ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆30Updated 3 months ago
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.☆246Updated last month
- The official codes of our work on AIGC detection: "Multiscale Positive-Unlabeled Detection of AI-Generated Texts" (ICLR'24 Spotlight)☆96Updated 8 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆61Updated 4 months ago
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…☆22Updated last year
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆26Updated 3 weeks ago
- LLM Unlearning☆112Updated 11 months ago
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆73Updated this week