xinleihe / MGTBenchLinks

☆159

Alternatives and similar repositories for MGTBench

Users that are interested in MGTBench are comparing it to the libraries listed below

Sorting:

GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆95Updated 6 months ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
IS2Lab / S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆76Updated last month
Allen-piexl / JailbreakZoo
☆144Updated 11 months ago
Aatrox103 / SAP
☆45Updated last year
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆275Updated last month
YancyKahn / CoA
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆34Updated 6 months ago
safr-ai-lab / survey-llm
A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code
☆68Updated last year
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆351Updated 7 months ago
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆144Updated 8 months ago
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆34Updated last year
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆41Updated last year
WhileBug / AwesomeLLMJailBreakPapers
Awesome LLM Jailbreak academic papers
☆104Updated last year
ICTMCG / Awesome-Machine-Generated-Text
Continuously updated list of related resources for generative LLMs like GPT and their analysis and detection.
☆223Updated 2 months ago
martiansideofthemoon / ai-detection-paraphrases
Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…
☆173Updated last year
thu-coai / ShieldLM
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆205Updated 10 months ago
DAMO-NLP-SG / multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆79Updated last year
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆236Updated last week
thunlp / OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
☆185Updated 2 years ago
AISG-Technology-Team / GCSS-Track-1A-Submission-Guide
Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).
☆16Updated last year
thunlp / Advbench
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…
☆54Updated 2 years ago
thu-coai / Targeted-Data-Extraction
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…
☆23Updated 2 years ago
THU-BPM / Robust_Watermark
Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
☆32Updated 8 months ago
tmlr-group / DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
☆157Updated last year
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆73Updated 11 months ago
OSU-NLP-Group / AgentSafety
☆99Updated 3 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆102Updated last month
THU-KEG / WaterBench
[ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks
☆28Updated last year
AI45Lab / ActorAttack
☆97Updated 6 months ago
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆69Updated 9 months ago