reds-lab/BEEAR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/reds-lab/BEEAR)

reds-lab / BEEAR

This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models".

☆23

Alternatives and similar repositories for BEEAR

Users that are interested in BEEAR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PurduePAML / DBS
View on GitHub
☆18Aug 15, 2022Updated 3 years ago
thestephencasper / latent_adversarial_training
View on GitHub
☆24Jul 25, 2024Updated last year
GiantSeaweed / DECREE
View on GitHub
Official repository for CVPR'23 paper: Detecting Backdoors in Pre-trained Encoders
☆39Sep 25, 2023Updated 2 years ago
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
AI-secure / TextGuard
View on GitHub
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
☆15Nov 7, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
huawei-lin / UniGuardian
View on GitHub
The implementation for paper "UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in …
☆17Jul 3, 2025Updated last year
ethz-spylab / rlhf-poisoning
View on GitHub
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆67Apr 24, 2024Updated 2 years ago
YiZeng623 / I-BAU
View on GitHub
Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''
☆53Nov 16, 2022Updated 3 years ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
fjzzq2002 / random_transformers
View on GitHub
Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)
☆15Sep 28, 2024Updated last year
usnistgov / trojai-example
View on GitHub
Example TrojAI Submission
☆27Dec 6, 2024Updated last year
ethz-spylab / rlhf_trojan_competition
View on GitHub
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆119Jun 13, 2024Updated 2 years ago
PurduePAML / PICCOLO
View on GitHub
☆26Dec 1, 2022Updated 3 years ago
meng-wenlong / LMSanitator
View on GitHub
☆29Aug 21, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
SolidShen / BAIT
View on GitHub
🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆57Jun 2, 2025Updated last year
ZephrFish / CVE-2021-28480_HoneyPoC3
View on GitHub
DO NOT RUN THIS.
☆10Jul 15, 2021Updated 5 years ago
MiracleHH / CBA
View on GitHub
Composite Backdoor Attacks Against Large Language Models
☆25Apr 12, 2024Updated 2 years ago
MAGAer13 / DeCapBench
View on GitHub
Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
☆14Mar 6, 2025Updated last year
reds-lab / Narcissus
View on GitHub
The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recogn…
☆128May 9, 2023Updated 3 years ago
YefanZhou / TempBalance
View on GitHub
[NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
☆37Apr 7, 2025Updated last year
staymylove / COT_Compresstion_via_Step_entropy
View on GitHub
☆27Aug 8, 2025Updated 11 months ago
zhangrui4041 / Instruction_Backdoor_Attack
View on GitHub
☆25Aug 21, 2024Updated last year
Shi-D / IMPapers
View on GitHub
Influence Maximization Paper List
☆11May 11, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xingyizhao / PURE
View on GitHub
Code associated with ICML (2024). "Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normaliz…
☆11Feb 22, 2026Updated 5 months ago
LucasFenaux / PILLAR-ESPN
View on GitHub
Code for the paper: Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions
☆12Mar 13, 2024Updated 2 years ago
xiaopp123 / knowledge_distillation
View on GitHub
bert蒸馏实践，包含BiLSTM蒸馏BERT和TinyBert
☆13Apr 23, 2022Updated 4 years ago
jinghuichen / AWM
View on GitHub
Github repo for One-shot Neural Backdoor Erasing via Adversarial Weight Masking (NeurIPS 2022)
☆15Jan 3, 2023Updated 3 years ago
imperial-aisp / mia_llms_benchmark
View on GitHub
Benchmarking MIAs against LLMs.
☆30Oct 8, 2024Updated last year
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
RU-System-Software-and-Security / NONE
View on GitHub
☆10Oct 31, 2022Updated 3 years ago
centerforaisafety / tdc2023-starter-kit
View on GitHub
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆92May 19, 2024Updated 2 years ago
ZrW00 / GraCeFul
View on GitHub
The code implementation of GraCeFul (Accepted in COLING 2025)
☆13Jan 27, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lynnegaogao / TransforLearn
View on GitHub
Interactive Visual Tutorial for the Transformer Model
☆12Sep 26, 2023Updated 2 years ago
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
sophie-xhonneux / Continuous-AdvTrain
View on GitHub
☆36Apr 13, 2026Updated 3 months ago
yihedeng9 / rlhf-summary-notes
View on GitHub
A brief and partial summary of RLHF algorithms.
☆152Mar 4, 2025Updated last year
zhangxin00 / segscope
View on GitHub
Proof-of-concept implementation for the paper "SegScope: Probing Fine-grained Interrupts via Architectural Footprints" (HPCA'24)
☆20Apr 2, 2026Updated 3 months ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
ucsb-seclab / BullseyePoison
View on GitHub
Bullseye Polytope Clean-Label Poisoning Attack
☆18Nov 5, 2020Updated 5 years ago