drivetosouth/SafeDialBench-Dataset

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/drivetosouth/SafeDialBench-Dataset)

drivetosouth / SafeDialBench-Dataset

Official github repo for SafeDialBench, a comprehensive multi-turn dialogue benchmark to evaluate LLMs' safety.

☆54

Alternatives and similar repositories for SafeDialBench-Dataset

Users that are interested in SafeDialBench-Dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HenryZhen97 / Reconsidering-Overthinking
View on GitHub
Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning
☆23Jun 23, 2026Updated last month
chenyt31 / RoboHiMan
View on GitHub
RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation
☆17Oct 16, 2025Updated 9 months ago
RL-VIG / LibContinual
View on GitHub
A Framework of Continual Learning
☆136Dec 9, 2025Updated 7 months ago
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated last month
Shaokang-Agent / D-F
View on GitHub
Implementation of the paper "Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning"
☆21Aug 17, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
NJUHuoJing / MAST
View on GitHub
ManifoldAlignmentStyleTransfer
☆45Feb 24, 2022Updated 4 years ago
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆27Dec 26, 2025Updated 7 months ago
edward3862 / CariMe-pytorch
View on GitHub
Unpaired Caricature Generation with Multiple Exaggerations (TMM 2021)
☆40Jul 14, 2021Updated 5 years ago
RL-VIG / LibFewShot
View on GitHub
[TPAMI 2023] LibFewShot: A Comprehensive Library for Few-shot Learning.
☆1,071Oct 27, 2025Updated 9 months ago
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year
salman-lui / x-teaming
View on GitHub
☆67May 21, 2025Updated last year
jity16 / ACE-Off-Policy-Actor-Critic-with-Causality-Aware-Entropy-Regularization
View on GitHub
Official PyTorch implementation of "ACE:Off-Policy Actor-Critic with Causality-Aware Entropy Regularization"
☆35May 13, 2024Updated 2 years ago
cheryyunl / ROVER
View on GitHub
Official eval code for ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
☆27Dec 12, 2025Updated 7 months ago
irasin / Pytorch_MST
View on GitHub
Unofficial Pytorch(1.0+) implementation of ICCV 2019 paper "Multimodal Style Transfer via Graph Cuts"
☆16Jan 9, 2020Updated 6 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
zhaoshiji123 / SI-Attack
View on GitHub
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
☆16Aug 6, 2025Updated 11 months ago
VincenDen / IID
View on GitHub
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation (CVPR24)
☆10Jun 16, 2024Updated 2 years ago
ErxinYu / CoSafe-Dataset
View on GitHub
☆13Nov 12, 2024Updated last year
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated last year
jianshuod / SafeSearch
View on GitHub
[ICML 2026] Official implementations of ``SafeSearch: Automated Red-Teaming of LLM-Based Search Agents''
☆19Mar 25, 2026Updated 4 months ago
ShenzheZhu / JailDAM
View on GitHub
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆26Nov 25, 2025Updated 8 months ago
S1XCamus / NJUCS-Course-Material-from-YikaiZhang
View on GitHub
南京大学 NJU 计算机系 CS 课程资料作业代码实验报告(数据挖掘模式识别机器学习导论概率论与数理统计计算机图形学高级程序设计数据库计算机系统基础操作系统程设实验数电数电实验... ) 更新中, star!
☆23Jun 28, 2020Updated 6 years ago
HeyuanMingong / llirl
View on GitHub
Code for "LifeLong Incremental Reinforcement Learning (LLIRL)"
☆21Jan 28, 2021Updated 5 years ago
remiMZ / HTS-ECCV22
View on GitHub
☆11Oct 9, 2022Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
NJU-LINK / CodeTracer
View on GitHub
☆84Jun 19, 2026Updated last month
mtbench101 / mt-bench-101
View on GitHub
[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
☆152Jul 24, 2024Updated 2 years ago
langfengQ / CoSo
View on GitHub
Official code for paper "Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning"
☆15Jun 12, 2025Updated last year
WenbinLee / ADM
View on GitHub
The Pytorch code of "Asymmetric Distribution Measure for Few-shot Learning", IJCAI 2020.
☆15Oct 9, 2020Updated 5 years ago
cjy97 / FSLKD
View on GitHub
knowledge distillation for few-shot learning
☆13Dec 27, 2023Updated 2 years ago
tobylyf / adv-attack
View on GitHub
Adversarial attacks including DeepFool and C&W
☆13May 20, 2019Updated 7 years ago
NJU-LINK / DRIFT
View on GitHub
Design for Error Detection in Deep-Research Agents Trajectories.
☆22Jun 4, 2026Updated last month
yifan-h / MechanisticProbe
View on GitHub
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
☆15Nov 4, 2023Updated 2 years ago
THU-KEG / SafetyNeuron
View on GitHub
Data and code for the paper: Finding Safety Neurons in Large Language Models
☆30Jan 29, 2026Updated 5 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lasgroup / SafetyPolytope
View on GitHub
Learning Safety Constraints for Large Language Models (ICML2025)
☆35May 25, 2026Updated 2 months ago
Dawn0523 / LAIES
View on GitHub
☆18Jul 14, 2023Updated 3 years ago
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
edward3862 / Analogist
View on GitHub
Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model (SIGGRAPH 2024)
☆38Sep 10, 2024Updated last year
kangmintong / R-2-Guard
View on GitHub
[ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
☆23Jul 8, 2024Updated 2 years ago
xujinglin / MvNNcor
View on GitHub
The code of "Deep Embedded Complementary and Interactive Information for Multi-view Classification", AAAI 2020.
☆12May 28, 2020Updated 6 years ago
JacyCui / njucs
View on GitHub
☆23Jan 5, 2025Updated last year