uiuc-arc / llm-code-watermarkLinks
LLM Program Watermarking
☆18Updated last year
Alternatives and similar repositories for llm-code-watermark
Users that are interested in llm-code-watermark are comparing it to the libraries listed below
Sorting:
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆76Updated 5 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆73Updated 7 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆100Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- ☆50Updated last year
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆77Updated 11 months ago
- Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"☆50Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Updated last year
- Code for watermarking language models☆84Updated last year
- Official code for the paper "CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules"☆48Updated last month
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆182Updated 8 months ago
- ☆18Updated last year
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆16Updated 9 months ago
- ☆53Updated 9 months ago
- 🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025☆34Updated 4 months ago
- ☆191Updated 2 years ago
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆61Updated last month
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆66Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆251Updated last year
- ☆114Updated 8 months ago
- The official repository for guided jailbreak benchmark☆26Updated 4 months ago
- ☆114Updated 2 years ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆176Updated 8 months ago
- This repository contains the source code, datasets, and scripts for the paper "GenderCARE: A Comprehensive Framework for Assessing and Re…☆27Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆333Updated last year
- Python package for measuring memorization in LLMs.☆176Updated 5 months ago
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆37Updated last year
- ☆38Updated last year
- Implementation of 'A Watermark for Large Language Models' paper by Kirchenbauer & Geiping et. al.☆24Updated 2 years ago
- ☆45Updated last month