uiuc-arc / llm-code-watermarkLinks
LLM Program Watermarking
☆17Updated last year
Alternatives and similar repositories for llm-code-watermark
Users that are interested in llm-code-watermark are comparing it to the libraries listed below
Sorting:
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆72Updated 6 months ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆63Updated 2 weeks ago
- Implementation of 'A Watermark for Large Language Models' paper by Kirchenbauer & Geiping et. al.☆24Updated 2 years ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆39Updated 2 months ago
- ☆44Updated 4 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆63Updated 2 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆326Updated 6 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆152Updated 9 months ago
- Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"☆48Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆165Updated 4 months ago
- Code to break Llama Guard☆31Updated last year
- Code for watermarking language models☆80Updated 11 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆141Updated last year
- ☆36Updated 2 years ago
- RepoQA: Evaluating Long-Context Code Understanding☆113Updated 9 months ago
- EvoEval: Evolving Coding Benchmarks via LLM☆76Updated last year
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆56Updated last month
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆64Updated last year
- Training and Benchmarking LLMs for Code Preference.☆34Updated 8 months ago
- Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"☆15Updated 4 months ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆114Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆87Updated 8 months ago
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆137Updated 3 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆56Updated 11 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆317Updated last year
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆81Updated 9 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆84Updated 3 months ago
- ☆32Updated last year
- [USENIX Security'24] REMARK-LLM: A robust and efficient watermarking framework for generative large language models☆25Updated 9 months ago