fra31/rlhf-trojan-competition-submission

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fra31/rlhf-trojan-competition-submission)

fra31 / rlhf-trojan-competition-submission

☆19

Alternatives and similar repositories for rlhf-trojan-competition-submission

Users that are interested in rlhf-trojan-competition-submission are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zmw12306 / ParDiff
View on GitHub
A Static Differential Analysis Tool of Network Protocol Parsers
☆30Feb 21, 2024Updated 2 years ago
tml-epfl / long-is-more-for-alignment
View on GitHub
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆21May 2, 2024Updated 2 years ago
ethz-spylab / rlhf_trojan_competition
View on GitHub
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆119Jun 13, 2024Updated 2 years ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
SolidShen / RIPPLE_official
View on GitHub
☆20Feb 11, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
max-andr / adversarial-random-search-gpt4
View on GitHub
Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]
☆43Apr 28, 2024Updated 2 years ago
tml-epfl / icl-alignment
View on GitHub
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆33Jan 23, 2025Updated last year
chichidd / llm-lora-trojan
View on GitHub
Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"
☆33Sep 11, 2024Updated last year
facebookresearch / jailbreak-objectives
View on GitHub
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆37Jul 2, 2026Updated 3 weeks ago
Shi-D / IMPapers
View on GitHub
Influence Maximization Paper List
☆11May 11, 2022Updated 4 years ago
AISIGSJTU / Siren
View on GitHub
Siren: Byzantine-robust Federated Learning via Proactive Alarming (SoCC '21)
☆11Mar 28, 2024Updated 2 years ago
RJ-T / NIPS2022_EP_BNP
View on GitHub
Official Implementation of NIPS 2022 paper Pre-activation Distributions Expose Backdoor Neurons
☆15Jan 13, 2023Updated 3 years ago
PurduePAML / Exray
View on GitHub
☆12May 27, 2022Updated 4 years ago
lynnegaogao / TransforLearn
View on GitHub
Interactive Visual Tutorial for the Transformer Model
☆12Sep 26, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ml-jku / align-rudder
View on GitHub
Code to reproduce results on toy tasks and companion blog for the paper.
☆24Jun 8, 2022Updated 4 years ago
Enoch-Liu / NdkMediaCodecDemo
View on GitHub
Android native mediacodec decode/encode demo
☆14Dec 16, 2021Updated 4 years ago
SRI-CSL / TrinityMultimodalTrojAI
View on GitHub
☆35Jun 27, 2022Updated 4 years ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆15Dec 16, 2024Updated last year
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
njuaplusplus / Elijah
View on GitHub
☆13May 1, 2024Updated 2 years ago
Megum1 / LOTUS
View on GitHub
[CVPR'24] LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
☆15Apr 17, 2026Updated 3 months ago
amberyzheng / IMMA
View on GitHub
[ECCV2024] Immunizing text-to-image Models against Malicious Adaptation
☆18Jan 17, 2025Updated last year
YuejiangLIU / csl
View on GitHub
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
☆15Feb 26, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
samueljero / snake
View on GitHub
State-based Network AttacK Explorer (SNAKE) code
☆14Sep 6, 2019Updated 6 years ago
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
abertsch72 / long-context-icl
View on GitHub
Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"
☆44Aug 20, 2024Updated last year
XZ-X / PEM
View on GitHub
☆16Dec 29, 2023Updated 2 years ago
dpaleka / stealing-part-lm-supplementary
View on GitHub
Some code for "Stealing Part of a Production Language Model"
☆23Mar 20, 2024Updated 2 years ago
Gwinhen / HardBeat
View on GitHub
This is the repository for USENIX Security 2023 paper "Hard-label Black-box Universal Adversarial Patch Attack".
☆15Sep 5, 2023Updated 2 years ago
SolidShen / BAIT
View on GitHub
🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆57Jun 2, 2025Updated last year
PurduePAML / DBS
View on GitHub
☆18Aug 15, 2022Updated 3 years ago
T1aNS1R / Evil-Geniuses
View on GitHub
☆71Feb 4, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
konpanousis / Adversarial-LWTA-AutoAttack
View on GitHub
☆12May 6, 2022Updated 4 years ago
centerforaisafety / tdc2023-starter-kit
View on GitHub
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆92May 19, 2024Updated 2 years ago
zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
logsem / cerise
View on GitHub
Formalisation of a capability machine and principles for reasoning about security properties
☆27Mar 12, 2026Updated 4 months ago
SheltonLiu-N / Universal-Prompt-Injection
View on GitHub
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆73Oct 23, 2024Updated last year
Megum1 / ODSCAN
View on GitHub
[IEEE S&P'24] ODSCAN: Backdoor Scanning for Object Detection Models
☆22Oct 5, 2025Updated 9 months ago
PurduePAML / K-ARM_Backdoor_Optimization
View on GitHub
☆18Jun 15, 2021Updated 5 years ago