shizhouxing / LLM-Detector-Robustness
[TACL] Code for "Red Teaming Language Model Detectors with Language Models"
☆16Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-Detector-Robustness
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆27Updated 2 years ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆45Updated last month
- [NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…☆29Updated last year
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆36Updated 2 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated 5 months ago
- ☆23Updated last month
- Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…☆85Updated this week
- Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models☆23Updated last year
- Multilingual safety benchmark for Large Language Models☆22Updated 2 months ago
- Code for the paper: ConDA: Contrastive Domain Adaptation for AI-generated Text Detection☆32Updated 10 months ago
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆51Updated this week
- ☆24Updated 11 months ago
- ☆38Updated last year
- ☆26Updated last year
- ☆15Updated 3 months ago
- ☆35Updated last year
- ☆37Updated last year
- ☆33Updated 3 weeks ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆58Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆70Updated 2 months ago
- Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936☆26Updated 5 months ago
- ☆33Updated last year
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆34Updated last year
- DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text☆25Updated last year
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆59Updated 8 months ago
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…☆23Updated last year
- ☆26Updated 6 months ago
- A Survey of Hallucination in Large Foundation Models☆50Updated 10 months ago
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆21Updated 4 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆58Updated last month