Weixin-Liang / ChatGPT-Detector-Bias
☆38Updated last year
Alternatives and similar repositories for ChatGPT-Detector-Bias:
Users that are interested in ChatGPT-Detector-Bias are comparing it to the libraries listed below
- Official Repository for Dataset Inference for LLMs☆32Updated 8 months ago
- The code and data for "Are Large Pre-Trained Language Models Leaking Your Personal Information?" (Findings of EMNLP '22)☆18Updated 2 years ago
- In-context Example Selection with Influences☆15Updated last year
- ☆53Updated 10 months ago
- ☆42Updated last month
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆47Updated 2 years ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆37Updated last month
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆95Updated last month
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆48Updated 5 months ago
- Official repository for "PostMark: A Robust Blackbox Watermark for Large Language Models"☆24Updated 7 months ago
- Interpretable unified language safety checking with large language models☆30Updated last year
- ☆30Updated 3 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆29Updated 2 years ago
- ☆53Updated 2 years ago
- ☆104Updated 11 months ago
- ☆19Updated last year
- ☆25Updated 6 months ago
- ☆11Updated 2 years ago
- A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)☆26Updated 3 years ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆59Updated 5 months ago
- Transformer-based model for learning authorship representations.☆35Updated 7 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆33Updated last month
- ☆37Updated 3 weeks ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆108Updated 11 months ago
- ☆44Updated 6 months ago
- [TACL] Code for "Red Teaming Language Model Detectors with Language Models"☆19Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆89Updated 10 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- ☆38Updated last year
- Weak-to-Strong Jailbreaking on Large Language Models☆72Updated last year