uiuc-arc / llm-code-watermarkLinks

LLM Program Watermarking

☆18

Alternatives and similar repositories for llm-code-watermark

Users that are interested in llm-code-watermark are comparing it to the libraries listed below

Sorting:

facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆70Updated last month
RobustNLP / DeRTa
A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.
☆65Updated 3 months ago
tml-epfl / llm-past-tense
Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]
☆74Updated 7 months ago
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆70Updated last year
jthickstun / watermark
Code for watermarking language models
☆82Updated last year
facebookresearch / three_bricks
Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"
☆48Updated last year
amazon-science / Repoformer
Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)
☆59Updated 3 months ago
azshue / AutoPoison
The official repository of the paper "On the Exploitability of Instruction Tuning".
☆64Updated last year
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆91Updated 9 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆99Updated 7 months ago
PurCL / ProSec
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
☆15Updated 5 months ago
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆143Updated last year
amazon-science / controlling-llm-memorization
☆39Updated 2 years ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆148Updated 5 months ago
SalesforceAIResearch / indict_code_gen
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
☆15Updated 8 months ago
Vaidehi99 / InfoDeletionAttacks
☆44Updated 7 months ago
Princeton-SysML / Jailbreak_LLM
☆183Updated last year
andyzoujm / breaking-llama-guard
Code to break Llama Guard
☆32Updated last year
weizeming / momentum-attack-llm
☆23Updated 8 months ago
arobey1 / smooth-llm
☆107Updated last year
allenai / wildteaming
☆34Updated last year
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆166Updated 2 months ago
collinzrj / output2prompt
☆46Updated 6 months ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆102Updated 6 months ago
BrianPulfer / LMWatermark
Implementation of 'A Watermark for Large Language Models' paper by Kirchenbauer & Geiping et. al.
☆24Updated 2 years ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆97Updated last year
THU-BPM / Robust_Watermark
Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
☆34Updated 10 months ago
hongcheki / sweet-watermark
Official repository of the paper: Who Wrote this Code? Watermarking for Code Generation (ACL 2024)
☆38Updated last year
SproutNan / AI-Safety_Benchmark
The official repository for guided jailbreak benchmark
☆18Updated last month
AI-secure / RedCode
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆48Updated 2 months ago