thestephencasper/latent_adversarial_training

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thestephencasper/latent_adversarial_training)

thestephencasper / latent_adversarial_training

☆24

Alternatives and similar repositories for latent_adversarial_training

Users that are interested in latent_adversarial_training are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
ejnnr / cupbearer
View on GitHub
A library for mechanistic anomaly detection
☆22Jan 9, 2025Updated last year
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated last year
Piyush-555 / GaussianDistillation
View on GitHub
Data-free knowledge distillation using Gaussian noise (NeurIPS paper)
☆15Mar 24, 2023Updated 3 years ago
thestephencasper / benchmarking_interpretability
View on GitHub
☆35Sep 13, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LukeBailey181 / obfuscated-activations
View on GitHub
Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses
☆31Feb 11, 2025Updated last year
shoaibahmed / metadata_archaeology
View on GitHub
Official code for the paper: "Metadata Archaeology"
☆19May 10, 2023Updated 3 years ago
TeunvdWeij / sandbagging
View on GitHub
☆21Nov 15, 2024Updated last year
Vinsonzyh / BlueSuffix
View on GitHub
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆31Nov 2, 2025Updated 8 months ago
SCLBD / Transfer_attack_RAP
View on GitHub
☆35Dec 16, 2022Updated 3 years ago
whj363636 / Self-Ensemble-Adversarial-Training
View on GitHub
SEAT
☆21Oct 10, 2023Updated 2 years ago
THUYimingLi / Semi-supervised_Robust_Training
View on GitHub
This is the code for semi-supervised robust training (SRT).
☆18Mar 24, 2023Updated 3 years ago
longtermrisk / openweights
View on GitHub
A python sdk for LLM finetuning and inference on runpod infrastructure
☆30May 12, 2026Updated 2 months ago
gokulp01 / ComTraq-MPC
View on GitHub
[IROS 2024] "ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization Updates" by Gokul Put…
☆13Apr 10, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Blkalkin / Optimal-TestTime
View on GitHub
☆10Mar 24, 2025Updated last year
yubol-bobo / MT-Consistency
View on GitHub
This repo investigates LLMs' tendency to exhibit acquiescence bias in sequential QA interactions. Includes evaluation methods, datasets, …
☆17Apr 24, 2026Updated 3 months ago
genglinliu / UnknownBench
View on GitHub
Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
☆14Feb 20, 2024Updated 2 years ago
Muzammal-Naseer / DCViT-AT
View on GitHub
Official repository for "Boosting Adversarial Transferability using Dynamic Cues " (ICLR 2023)
☆20Aug 24, 2023Updated 2 years ago
zhangrui4041 / Instruction_Backdoor_Attack
View on GitHub
☆25Aug 21, 2024Updated last year
SLIT-AI / ADPA
View on GitHub
[ICLR2025 Spotlight] Advantage-Guided Distillation for Preference Alignment in Small Language Models
☆26Feb 10, 2025Updated last year
xiaopp123 / knowledge_distillation
View on GitHub
bert蒸馏实践，包含BiLSTM蒸馏BERT和TinyBert
☆13Apr 23, 2022Updated 4 years ago
robgon-art / GreenLIT
View on GitHub
GreenLIT: Using GPT-J with Multi-Task Learning to Create New Screenplays
☆16Nov 27, 2022Updated 3 years ago
DripNowhy / ETA
View on GitHub
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆34Jul 20, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Dahoas / gpt-neox-finetuning
View on GitHub
☆15Mar 12, 2022Updated 4 years ago
EleutherAI / training-jacobian
View on GitHub
☆24Dec 11, 2024Updated last year
meridianlabs-ai / inspect_flow
View on GitHub
Inspect Flow is a workflow stack built on Inspect AI that enables research organisations to run AI evaluations at scale.
☆17Updated this week
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
WanliYoung / Revisit-Editing-Evaluation
View on GitHub
Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"
☆18Aug 27, 2025Updated 11 months ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
cdj0311 / bert_distill_lstm
View on GitHub
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.
☆15Aug 28, 2020Updated 5 years ago
AISafety-HKUST / Backdoor_Safety_Tuning
View on GitHub
Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)
☆27Nov 18, 2024Updated last year
sophie-xhonneux / Continuous-AdvTrain
View on GitHub
☆36Apr 13, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cleverhans-lab / dataset-inference
View on GitHub
[ICLR'21] Dataset Inference for Ownership Resolution in Machine Learning
☆31Oct 10, 2022Updated 3 years ago
mingdachen / WikiTableT
View on GitHub
Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"
☆21Feb 5, 2021Updated 5 years ago
yfqiu-nlp / sea-llm
View on GitHub
Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"
☆31Dec 20, 2024Updated last year
OPTML-Group / QF-Attack
View on GitHub
[CVPR23W] "A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion" by Haomin Zhuang, Yihua Zhang and Sijia Liu
☆27Aug 27, 2024Updated last year
declare-lab / trust-align
View on GitHub
Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…
☆76Mar 3, 2025Updated last year
techmonsterwang / iLLaMA
View on GitHub
Adapting LLaMA Decoder to Vision Transformer
☆30May 20, 2024Updated 2 years ago
rpatrik96 / nl-causal-representations
View on GitHub
This is the code for the paper Jacobian-based Causal Discovery with Nonlinear ICA, demonstrating how identifiable representations (partic…
☆22Sep 5, 2024Updated last year