AAAAAAsuka/llm_defends

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AAAAAAsuka/llm_defends)

AAAAAAsuka / llm_defends

code of paper "Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM"

☆14

Alternatives and similar repositories for llm_defends

Users that are interested in llm_defends are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CSQianDong / MBGE-recognition
View on GitHub
Code for MBGE-recognition: Emotion recognition based on multi-view body gestures, accepted at ICIP 2019.
☆12Apr 6, 2023Updated 3 years ago
CSQianDong / R-former
View on GitHub
Code for R-former: Legal Judgment Prediction via Relational Learning, accepted at SIGIR 2021.
☆23Feb 21, 2022Updated 4 years ago
CSQianDong / KERM
View on GitHub
Code for KERM: Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking, accepted at SIGIR 2022.
☆19Oct 31, 2022Updated 3 years ago
jinghuichen / FedCAMS
View on GitHub
Github Repo for ICML 2022 paper: Communication-Efficient Adaptive Federated Learning
☆10Nov 18, 2022Updated 3 years ago
PeixianChen / citation-count
View on GitHub
基于 Google Scholar 的论文他引次数统计。
☆14Dec 8, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
sleeepeer / PISanitizer
View on GitHub
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
☆18Dec 10, 2025Updated 7 months ago
huaxiuyao / Wild-Time
View on GitHub
Benchmark for Natural Temporal Distribution Shift (NeurIPS 2022)
☆68Mar 29, 2023Updated 3 years ago
huaxiuyao / HSML_Dynamic
View on GitHub
HSML Dynamic version for ICML 2019
☆12Jul 11, 2019Updated 7 years ago
albert-y1n / PISmith
View on GitHub
PISmith: Reinforcement Learning-based Red Teaming for Prompt Injection Defenses
☆21Updated this week
jam3scampbell / llama-lying
View on GitHub
Code for our paper "Localizing Lying in Llama"
☆15Apr 24, 2025Updated last year
AAAAAAsuka / Impress
View on GitHub
code of paper "IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Gene…
☆35May 23, 2024Updated 2 years ago
XiudiChen / ConPL
View on GitHub
☆10Aug 22, 2023Updated 2 years ago
kay-ck / BSC-Attack
View on GitHub
[AAAI2022] Code Release of Attacking Video Recognition Models with Bullet-Screen Comments
☆25Mar 30, 2024Updated 2 years ago
Yifan-Song793 / InfoCL
View on GitHub
Findings of EMNLP 2023: InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspe…
☆14Aug 13, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mhw32 / prototransformer-public
View on GitHub
PyTorch implementation for "ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback" (https://arxiv.org/abs/2107.14035).
☆16Sep 9, 2022Updated 3 years ago
MadryLab / AT2
View on GitHub
Attribute statements generated by LLMs to preceding tokens using attention weights.
☆28Apr 22, 2025Updated last year
youngwoo-yoon / 2d_to_3d_human_pose_converter
View on GitHub
☆11Apr 6, 2019Updated 7 years ago
illidanlab / FOSTER
View on GitHub
☆16May 18, 2023Updated 3 years ago
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
seonhee99 / EHR-SeqSQL
View on GitHub
Official repository of "EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records" (ACL 2024 Fi…
☆17Jul 5, 2024Updated 2 years ago
THUIR / THUIR-website
View on GitHub
THUIR website
☆10Feb 23, 2026Updated 4 months ago
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
khhung-906 / Attention-Tracker
View on GitHub
Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs
☆28Sep 19, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Spinozaaa / AdvI2I
View on GitHub
☆24Sep 5, 2025Updated 10 months ago
zichuan-liu / IB4LLMs
View on GitHub
[NeurIPS'24] Protecting Your LLMs with Information Bottleneck
☆25Nov 7, 2024Updated last year
xingchenzhao / Generating-Human-Skeletons-with-Mutual-Actions-WGAN-Pytorch
View on GitHub
Generating Human Skeletons with Mutual Actions
☆11Oct 22, 2021Updated 4 years ago
PositionalHidden / PositionalHidden
View on GitHub
To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …
☆12Jun 18, 2024Updated 2 years ago
ysh-1998 / CSRec
View on GitHub
The official implementation for Common Sense Enhanced Knowledge-based Recommendation with Large Language Model
☆15Apr 21, 2024Updated 2 years ago
tsachiblau / Threat-Model-Agnostic-Adversarial-Defense-using-Diffusion-Models
View on GitHub
☆12Jul 19, 2022Updated 4 years ago
sleeepeer / PIArena
View on GitHub
[ACL 2026] PIArena: A Platform for Prompt Injection Evaluation
☆39Apr 28, 2026Updated 2 months ago
xiaofen9 / chatgpt-writting-extension
View on GitHub
A browser extension that enhance your paper writting with ChatGPT
☆27May 14, 2024Updated 2 years ago
megagonlabs / zett
View on GitHub
Code for Zero-shot Triplet Extraction by Template Infilling (Kim et al; IJCNLP-AACL 2023)
☆21Feb 17, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tonychenxyz / selfie
View on GitHub
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…
☆58Dec 9, 2024Updated last year
xunguangwang / SoK4JailbreakGuardrails
View on GitHub
[S&P 2026] SoK: Evaluating Jailbreak Guardrails for Large Language Models
☆44Dec 17, 2025Updated 7 months ago
scott89 / deeplens_eval
View on GitHub
☆37Jun 8, 2019Updated 7 years ago
LZU-SIAT / PCRP
View on GitHub
Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton based Action Recognition
☆11Aug 30, 2021Updated 4 years ago
huaxiuyao / ATS
View on GitHub
ATS for NeurIPS 2021
☆24Nov 4, 2021Updated 4 years ago
WinnieHAN / structure_adv
View on GitHub
☆10Oct 28, 2020Updated 5 years ago
EricLee8 / SPACE
View on GitHub
The official codes for our paper at COLING 2022: Semantic-Preserving Adversarial Code Comprehension
☆12Oct 23, 2022Updated 3 years ago