Weixin-Liang / ChatGPT-Detector-Bias

☆38

Alternatives and similar repositories for ChatGPT-Detector-Bias

Users that are interested in ChatGPT-Detector-Bias are comparing it to the libraries listed below

Sorting:

AlexWan0 / Poisoning-Instruction-Tuned-Models
☆55Updated 11 months ago
BrachioLab / incontext_influences
In-context Example Selection with Influences
☆15Updated 2 years ago
yihuaihong / ConceptVectors
ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆35Updated 3 months ago
milesaturpin / cot-unfaithfulness
☆43Updated last year
tml-epfl / icl-alignment
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆30Updated 3 months ago
pratyushmaini / llm_dataset_inference
Official Repository for Dataset Inference for LLMs
☆33Updated 9 months ago
shizhouxing / LLM-Detector-Robustness
[TACL] Code for "Red Teaming Language Model Detectors with Language Models"
☆20Updated last year
zjysteven / mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆37Updated 3 months ago
SALT-NLP / chain-of-thought-bias
☆26Updated 7 months ago
shadowkiller33 / Contrast-Instruction
☆19Updated last year
arobey1 / advbench
☆43Updated 2 years ago
thu-coai / Targeted-Data-Extraction
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…
☆23Updated 2 years ago
martiansideofthemoon / ai-detection-paraphrases
Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…
☆168Updated last year
qiuhuachuan / latent-jailbreak
☆38Updated 11 months ago
Vaidehi99 / InfoDeletionAttacks
☆43Updated 3 months ago
leix28 / prompt-universal-vulnerability
Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022
☆29Updated 2 years ago
UKPLab / emnlp2024-code-prompting
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. EMNLP 2024
☆20Updated 6 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
declare-lab / resta
Restore safety in fine-tuned language models through task arithmetic
☆28Updated last year
terarachang / DataICL
Data Valuation on In-Context Examples (ACL23)
☆23Updated 4 months ago
google / belief-localization
This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…
☆61Updated 2 years ago
feradauto / MoralCoT
Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
☆38Updated last year
vinusankars / Reliability-of-AI-text-detectors
Can AI-Generated Text be Reliably Detected?
☆79Updated last year
CEBaBing / CEBaB
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
☆12Updated 2 years ago
lifan-yuan / OOD_NLP
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆33Updated last year
NoviScl / GPT3-Reliability
☆78Updated 2 years ago
petezh / OpenD5
Tasks for describing differences between text distributions.
☆16Updated 9 months ago
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 6 months ago
Yangyi-Chen / PaperList-Trustworthy-Applications
Mostly recording papers about models' trustworthy applications. Intending to include topics like model evaluation & analysis, security, c…
☆21Updated last year
lilakk / PostMark
Official repository for "PostMark: A Robust Blackbox Watermark for Large Language Models"
☆26Updated 8 months ago