cisco-open / modelsmithLinks

A toolkit for optimizing machine learning models for practical applications

☆27

Alternatives and similar repositories for modelsmith

Users that are interested in modelsmith are comparing it to the libraries listed below

Sorting:

vfleaking / PTST
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆18Updated last year
arobey1 / advbench
☆43Updated 2 years ago
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆79Updated 8 months ago
facebookresearch / jailbreak-objectives
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆32Updated 6 months ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
☆20Updated 6 months ago
allenai / wildteaming
☆29Updated 10 months ago
Confirm-Solutions / flrt
Fluent student-teacher redteaming
☆22Updated 11 months ago
XuandongZhao / weak-to-strong
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆76Updated last month
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆69Updated last year
UCSB-NLP-Chang / ULD
Implementation of paper 'Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference' [NeurIPS'24…
☆20Updated last year
locuslab / acr-memorization
☆35Updated 6 months ago
azshue / AutoPoison
The official repository of the paper "On the Exploitability of Instruction Tuning".
☆64Updated last year
SchwinnL / circuit-breakers-eval
Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting
☆17Updated 2 months ago
ShanglunFengatETHZ / PrivacyBackdoor
Privacy backdoors
☆51Updated last year
aks2203 / easy-to-hard-data
Pytorch Datasets for Easy-To-Hard
☆27Updated 5 months ago
papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆36Updated 11 months ago
lapisrocks / rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆53Updated 10 months ago
princeton-polaris-lab / Evaluating-Durable-Safeguards
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Updated last week
LukeBailey181 / obfuscated-activations
Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses
☆20Updated 4 months ago
arobey1 / smooth-llm
☆101Updated last year
microsoft / DPSDA
Private Evolution: Generating DP Synthetic Data without Training [ICLR 2024, ICML 2024 Spotlight]
☆97Updated last month
lchen001 / HAPI
☆17Updated 2 years ago
ejones313 / auditing-llms
☆54Updated 2 years ago
MadryLab / failure-directions
Distilling Model Failures as Directions in Latent Space
☆47Updated 2 years ago
amazon-science / controlling-llm-memorization
☆36Updated 2 years ago
ethz-spylab / realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
☆20Updated last year
poloclub / llm-landscape
NeurIPS'24 - LLM Safety Landscape
☆23Updated 4 months ago
milesaturpin / cot-unfaithfulness
☆45Updated last year
rmin2000 / adv_tracing
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Updated 11 months ago
thestephencasper / latent_adversarial_training
☆22Updated 11 months ago