Papers from our SoK on Red-Teaming (Accepted at TMLR)
☆43Apr 14, 2026Updated 2 weeks ago
Alternatives and similar repositories for awesome-red-teaming-llms
Users that are interested in awesome-red-teaming-llms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Emoji Attack [ICML 2025]☆41Jul 15, 2025Updated 9 months ago
- A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation☆68Mar 2, 2026Updated 2 months ago
- ☆44Oct 1, 2024Updated last year
- ☆14Feb 26, 2025Updated last year
- Source code, datasets and models of the paper "Efficient White-box Fairness Testing through Gradient Search" by Lingfeng Zhang, Yueling Z…☆11Jul 24, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- ☆20May 14, 2025Updated 11 months ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated 2 years ago
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated 2 months ago
- [NeurIPS'24] "NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes" by Hao-Lun …☆10Sep 18, 2025Updated 7 months ago
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- [TOIS'24] "RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation"☆16Dec 1, 2024Updated last year
- ☆12Mar 24, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Adversarial Item Promotion in visually-aware recommenders☆17Sep 3, 2021Updated 4 years ago
- Demo code for the paper: One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features☆12Nov 30, 2023Updated 2 years ago
- ☆11Jun 20, 2023Updated 2 years ago
- ☆16May 16, 2025Updated 11 months ago
- Watermarking LLM papers up-to-date☆12Dec 17, 2023Updated 2 years ago
- 主题:计算认知科学(Computational Cognitive Science)。此仓库诞生背景为IA003结业BP,仍处于萌芽期,内容设置有待转正。下一次大规模更新估计在三四年之后。☆17May 22, 2019Updated 6 years ago
- Code for CVPR24 Paper - Resource-Efficient Transformer Pruning for Finetuning of Large Models☆12Oct 31, 2025Updated 6 months ago
- ☆16Feb 8, 2024Updated 2 years ago
- How Robust are Randomized Smoothing based Defenses to Data Poisoning? (CVPR 2021)☆14Jul 16, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)☆15Jul 9, 2023Updated 2 years ago
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- [EACL'23] COVID-VTS: Fact Extraction and Verification on Short Video Platforms☆11Sep 26, 2023Updated 2 years ago
- ☆13May 25, 2022Updated 3 years ago
- [ACL 2023 findings] Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization☆17Aug 26, 2023Updated 2 years ago
- REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation☆50Apr 25, 2026Updated last week
- code for "Generative News Recommendation"☆15May 31, 2024Updated last year
- ☆17Sep 25, 2024Updated last year
- ☆60Jun 13, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆27Oct 23, 2025Updated 6 months ago
- Start here!☆11Feb 19, 2020Updated 6 years ago
- Image Captioning Model Implemented in PyTorch using CNN followed by LSTM☆13Apr 5, 2018Updated 8 years ago
- ☆21Jun 16, 2025Updated 10 months ago
- Towards LLM Empowered Recommendation via Tool Learning☆23Aug 8, 2025Updated 8 months ago
- The official repository for guided jailbreak benchmark☆29Jul 28, 2025Updated 9 months ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆22Sep 21, 2025Updated 7 months ago