microsoft/TOXIGEN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/TOXIGEN)

microsoft / TOXIGEN

This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.

☆351

Alternatives and similar repositories for TOXIGEN

Users that are interested in TOXIGEN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SALT-NLP / implicit-hate
View on GitHub
☆46Jun 29, 2023Updated 3 years ago
bvidgen / Dynamically-Generated-Hate-Speech-Dataset
View on GitHub
Repository for the Dynamically Generated Hate Speech Dataset by Vidgen et al. (2021).
☆44May 26, 2025Updated last year
amazon-science / bold
View on GitHub
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
☆88Mar 2, 2021Updated 5 years ago
paul-rottger / hatecheck-data
View on GitHub
Röttger et al. (ACL 2021): "HateCheck: Functional Tests for Hate Speech Detection Models" - Data
☆59Oct 14, 2025Updated 9 months ago
allenai / real-toxicity-prompts
View on GitHub
☆233Feb 23, 2021Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
abaheti95 / ToxiChat
View on GitHub
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming so…
☆17Jul 27, 2023Updated 2 years ago
XuhuiZhou / Toxic_Debias
View on GitHub
code for our EACL 2021 paper: "Challenges in Automated Debiasing for Toxic Language Detection" by Xuhui Zhou, Maarten Sap, Swabha Swayamd…
☆20Aug 20, 2021Updated 4 years ago
unitaryai / detoxify
View on GitHub
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transfor…
☆1,278Jul 6, 2026Updated 2 weeks ago
hartvigsen-group / composable-interventions
View on GitHub
☆29Feb 27, 2025Updated last year
joonkeekim / hare-hate-speech
View on GitHub
Official repository of "HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning", Findings of EMNLP 2023
☆28Jan 25, 2024Updated 2 years ago
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,851Jun 17, 2025Updated last year
youngwook06 / ImpCon
View on GitHub
Generalizable Implicit Hate Speech Detection using Contrastive Learning (COLING 2022)
☆14Oct 9, 2022Updated 3 years ago
hate-alert / HateXplain
View on GitHub
Can we use explanations to improve hate speech models? Our paper accepted at AAAI 2021 tries to explore that question.
☆248Jun 12, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
serenayj / PyrEval
View on GitHub
Automated Pyramid Summarization Evaluation
☆12Jun 2, 2024Updated 2 years ago
microsoft / adaptive-testing
View on GitHub
Find and fix bugs in natural language machine learning models using adaptive testing.
☆190May 7, 2024Updated 2 years ago
thu-coai / Safety-Prompts
View on GitHub
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts，用于评估和提升大模型的安全性。
☆1,186Feb 27, 2024Updated 2 years ago
whitzard-ai / jade-db
View on GitHub
"他山之石、可以攻玉"：复旦JADE团队发布的大模型测评与治理系列
☆517May 14, 2026Updated 2 months ago
kelichiu / GPT3-hate-speech-detection
View on GitHub
Using GPT-3 to detect hate speech that contains sexist and racist content
☆24Nov 11, 2025Updated 8 months ago
swabhs / notebooks_for_aflite
View on GitHub
IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".
☆16Aug 14, 2020Updated 5 years ago
ssu-humane / K-HATERS
View on GitHub
Hate speech detection corpus in Korean, shared with EMNLP 2023 paper
☆17Apr 19, 2024Updated 2 years ago
nyu-mll / crows-pairs
View on GitHub
This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…
☆137Mar 1, 2024Updated 2 years ago
haven-jeon / KoGPT2-subtasks
View on GitHub
NSMC, KorSTS ... fine-tunings
☆18Feb 23, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nyu-mll / BBQ
View on GitHub
Repository for the Bias Benchmark for QA dataset.
☆146Jan 8, 2024Updated 2 years ago
codertimo / python-template
View on GitHub
python project template for personal projects! 🙋‍♀️
☆11Nov 28, 2020Updated 5 years ago
sylinrl / TruthfulQA
View on GitHub
TruthfulQA: Measuring How Models Imitate Human Falsehoods
☆933Jan 16, 2025Updated last year
MinhDucBui / Multi3Hate
View on GitHub
☆15Jan 6, 2025Updated last year
shauli-ravfogel / adv-kernel-removal
View on GitHub
☆12Oct 23, 2022Updated 3 years ago
xinleihe / toxic-prompt
View on GitHub
☆27Nov 20, 2023Updated 2 years ago
xhan77 / veiled-toxicity-detection
View on GitHub
Fortifying Toxic Speech Detectors Against Veiled Toxicity
☆11Oct 21, 2020Updated 5 years ago
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
View on GitHub
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆281May 13, 2024Updated 2 years ago
AI-secure / DecodingTrust
View on GitHub
A Comprehensive Assessment of Trustworthiness in GPT Models
☆314Sep 16, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
declare-lab / red-instruct
View on GitHub
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆111Mar 8, 2024Updated 2 years ago
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆599Feb 27, 2026Updated 4 months ago
velocityCavalry / CREPE
View on GitHub
An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"
☆16Nov 5, 2024Updated last year
shreyansh26 / Extracting-Training-Data-from-Large-Langauge-Models
View on GitHub
A re-implementation of the "Extracting Training Data from Large Language Models" paper by Carlini et al., 2020
☆39Jul 10, 2022Updated 4 years ago
X-PLUG / CValues
View on GitHub
面向中文大模型价值观的评估与对齐研究
☆560Jul 20, 2023Updated 3 years ago
naver-ai / korean-safety-benchmarks
View on GitHub
Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)
☆252Jun 29, 2023Updated 3 years ago
zeeraktalat / hatespeech
View on GitHub
☆236Dec 27, 2016Updated 9 years ago