Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
β195Updated last week
Related projects: β
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.β187Updated 2 weeks ago
- UP-TO-DATE LLM Watermark paper. π₯π₯π₯β253Updated 3 months ago
- LLM Unlearningβ112Updated 11 months ago
- Code for watermarking language modelsβ69Updated last week
- γACL 2024γ SALAD benchmark & MD-Judgeβ81Updated this week
- A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, currentβ¦β167Updated 10 months ago
- A resource repository for machine unlearning in large language modelsβ131Updated this week
- β27Updated 3 months ago
- Accepted by IJCAI-24 Survey Trackβ117Updated 3 weeks ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ65Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"β64Updated 2 weeks ago
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarksβ17Updated 10 months ago
- SeqXGPT: An advance method for sentence-level AI-generated text detection.β69Updated 11 months ago
- Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.β104Updated last week
- The official implementation of our NAACL 2024 paper "A Wolf in Sheepβs Clothing: Generalized Nested Jailbreak Prompts can Fool Large Langβ¦β65Updated 2 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuningβ79Updated 3 months ago
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defenseβ¦β131Updated 10 months ago
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Modelsβ61Updated last week
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β73Updated this week
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.β246Updated last month
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873β110Updated 4 months ago
- A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, currentβ¦β57Updated 8 months ago
- β101Updated last year
- β143Updated 9 months ago
- A collection of automated evaluators for assessing jailbreak attempts.β55Updated 2 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Modelsβ22Updated 11 months ago
- Accepted by ECCV 2024β59Updated 2 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.β61Updated 4 months ago
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"β45Updated last month
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modificationsβ55Updated 2 months ago