walledai/walledeval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/walledai/walledeval)

walledai / walledeval

Test LLMs against jailbreaks and unprecedented harms

☆40

Alternatives and similar repositories for walledeval

Users that are interested in walledeval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

declare-lab / ferret
View on GitHub
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆19Aug 22, 2024Updated last year
declare-lab / VIP
View on GitHub
Our EMNLP 2022 paper on VIP-Based Prompting for Parameter-Efficient Learning
☆10Oct 22, 2022Updated 3 years ago
declare-lab / KNOT
View on GitHub
This repository contains the implementation of the paper -- KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks
☆15Sep 15, 2022Updated 3 years ago
declare-lab / CIDER
View on GitHub
This repository contains the dataset and the pytorch implementations of the models from the paper CIDER: Commonsense Inference for Dialog…
☆27Oct 30, 2022Updated 3 years ago
snu-mllab / Bayesian-Red-Teaming
View on GitHub
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Jul 9, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
microsoft / ConstrainedReasoner
View on GitHub
☆13Aug 26, 2024Updated last year
NetSPI / Open-LLM-Security-Benchmark
View on GitHub
☆22Dec 16, 2024Updated last year
declare-lab / resta
View on GitHub
Restore safety in fine-tuned language models through task arithmetic
☆33Mar 28, 2024Updated 2 years ago
diambra / agents
View on GitHub
Example Agents for DIAMBRA Arena Environments
☆18Sep 3, 2024Updated last year
declare-lab / red-instruct
View on GitHub
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆111Mar 8, 2024Updated 2 years ago
declare-lab / della
View on GitHub
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
☆37Jul 12, 2024Updated 2 years ago
declare-lab / speech-adapters
View on GitHub
Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech und…
☆43Mar 12, 2023Updated 3 years ago
LIONS-EPFL / Charmer
View on GitHub
Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024
☆19Feb 12, 2025Updated last year
ecmonsen / gendered_words
View on GitHub
Dictionary of English words tagged with their natural gender.
☆13Sep 7, 2021Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
declare-lab / WikiDes
View on GitHub
A Wikipedia-based summarization dataset
☆14Mar 27, 2023Updated 3 years ago
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
claws-lab / casper
View on GitHub
Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"
☆10Aug 16, 2022Updated 3 years ago
EoinKenny / Prototype-Wrapper-Network-ICLR23
View on GitHub
☆12Dec 15, 2024Updated last year
zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
CSIPlab / EBAD
View on GitHub
Code repository for Ensemble-based Blackbox Attacks on Dense Prediction (EBAD), CVPR 2023
☆28May 17, 2024Updated 2 years ago
allenai / wildteaming
View on GitHub
☆42Aug 10, 2024Updated last year
teddysum / korean_evaluation
View on GitHub
☆10Jun 5, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
W-Wu / ERC-SLT22
View on GitHub
Code for "Distribution-based Emotion Recognition in Conversation"
☆18Feb 6, 2023Updated 3 years ago
CGCL-codes / DarkSAM
View on GitHub
The implementation of our NeurIPS 2024 paper "DarkSAM: Fooling Segment Anything Model to Segment Nothing".
☆14Nov 4, 2024Updated last year
CryptoAILab / JailbreakEval
View on GitHub
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆193Apr 1, 2025Updated last year
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated 11 months ago
McGill-NLP / feedbackqa
View on GitHub
FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback
☆12Jul 13, 2022Updated 4 years ago
rainavyas / prepend_acoustic_attack
View on GitHub
Prepend universal audio attack segment to mute Whisper
☆41Jan 22, 2025Updated last year
iNLP-Lab / reading-group
View on GitHub
☆18Jun 17, 2026Updated last month
declare-lab / EFLA
View on GitHub
Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
☆76Mar 26, 2026Updated 3 months ago
EmoryMLIP / DynamicBlocks
View on GitHub
A pytorch toolbox designed for experimentation using a generalization of the Resnet design.
☆15Jul 6, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
DSN-2024 / DSN
View on GitHub
DSN jailbreak Attack & Evaluation Ensemble
☆17Feb 7, 2026Updated 5 months ago
shighghyujie / newpatch-rl
View on GitHub
Simultaneously Optimizing Perturbations and Positions for Black-box Adversarial Patch Attacks (TPAMI 2022)
☆35Feb 9, 2023Updated 3 years ago
liuzrcc / AIP
View on GitHub
Adversarial Item Promotion in visually-aware recommenders
☆17Sep 3, 2021Updated 4 years ago
keing1 / reward-hack-generalization
View on GitHub
Datasets used in the paper "Reward hacking behavior can generalize across tasks"
☆15Aug 17, 2025Updated 11 months ago
ShaojieJiang / CT-Loss
View on GitHub
The contrastive token loss function for reducing generative repetition of autoregressive neural language models.
☆13May 11, 2022Updated 4 years ago
salesforce / bite
View on GitHub
Code for "Mind Your Inflections! Improving NLP for Non-Standard Englishes with Base-Inflection Encoding" (EMNLP 2020).
☆11May 1, 2025Updated last year
XiaofengZhu / uRank_uMart
View on GitHub
Listwise Learning to Rank by Exploring Unique Ratings (WSDM 2020)
☆13Nov 2, 2025Updated 8 months ago