Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
β216Updated last week
Related projects β
Alternatives and complementary repositories for Awesome_papers_on_LLMs_detection
- Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.β197Updated 2 months ago
- UP-TO-DATE LLM Watermark paper. π₯π₯π₯β293Updated this week
- A survey and reflection on the latest research breakthroughs in LLM-generated Text detection, including data, detectors, metrics, currentβ¦β173Updated last year
- A resource repository for machine unlearning in large language modelsβ218Updated last week
- Code for watermarking language modelsβ72Updated 2 months ago
- LLM Unlearningβ125Updated last year
- γACL 2024γ SALAD benchmark & MD-Judgeβ106Updated last month
- Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defenseβ¦β138Updated last year
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Surveyβ76Updated 3 months ago
- π up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.β133Updated last week
- SeqXGPT: An advance method for sentence-level AI-generated text detection.β76Updated last year
- A collection of automated evaluators for assessing jailbreak attempts.β75Updated 4 months ago
- Accepted by ECCV 2024β74Updated last month
- β31Updated 5 months ago
- A survey on harmful fine-tuning attack for large language modelβ80Updated last week
- LLM hallucination paper listβ293Updated 8 months ago
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"β52Updated last week
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"β71Updated 2 months ago
- β30Updated 3 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.β71Updated 6 months ago
- An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)β56Updated 4 months ago
- Accepted by IJCAI-24 Survey Trackβ159Updated 2 months ago
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Modelsβ23Updated last year
- [ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarksβ18Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..β174Updated last month
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learningβ153Updated 9 months ago
- β111Updated last year
- β185Updated 3 weeks ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheepβs Clothing: Generalized Nested Jailbreak Prompts can Fool Large Langβ¦β74Updated 4 months ago
- The code implementation of the paper CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Low Resource With Contrastive Learniβ¦β12Updated 7 months ago