cisco-open / modelsmith
A toolkit for optimizing machine learning models for practical applications
☆26Updated 2 months ago
Alternatives and similar repositories for modelsmith
Users that are interested in modelsmith are comparing it to the libraries listed below
Sorting:
- ☆43Updated 2 years ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆78Updated 6 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆93Updated 11 months ago
- Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"☆29Updated 5 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆46Updated 6 months ago
- ☆25Updated 9 months ago
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆47Updated last month
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆52Updated 9 months ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆16Updated last month
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆18Updated last year
- [CVPR 2022] "Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free" by Tianlong Chen*, Zhenyu Zhang*, Yihua Zhang*, Shiyu C…☆26Updated 2 years ago
- ☆55Updated 11 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- [ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers☆12Updated 3 years ago
- ☆46Updated last week
- ☆43Updated 3 months ago
- ☆20Updated 5 months ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆36Updated 10 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆74Updated 2 weeks ago
- Codebase for decoding compressed trust.☆23Updated last year
- ☆170Updated last year
- Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs (ACM CCS'21)☆17Updated 2 years ago
- ☆24Updated 7 months ago
- Debiasing Through Data Attribution☆11Updated 11 months ago
- Code for our paper "Decomposing The Dark Matter of Sparse Autoencoders"☆22Updated 3 months ago
- ☆21Updated 4 months ago
- ☆37Updated 9 months ago
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆55Updated 2 months ago
- ☆39Updated 7 months ago
- NeurIPS'24 - LLM Safety Landscape☆22Updated 2 months ago