unitaryai/detoxify

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/unitaryai/detoxify)

unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

☆1,279

Alternatives and similar repositories for detoxify

Users that are interested in detoxify are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

microsoft / TOXIGEN
View on GitHub
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
☆351Jun 17, 2024Updated 2 years ago
hate-alert / HateXplain
View on GitHub
Can we use explanations to improve hate speech models? Our paper accepted at AAAI 2021 tries to explore that question.
☆248Jun 12, 2023Updated 3 years ago
kelichiu / GPT3-hate-speech-detection
View on GitHub
Using GPT-3 to detect hate speech that contains sexist and racist content
☆24Nov 11, 2025Updated 8 months ago
hate-alert / DE-LIMIT
View on GitHub
DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.
☆112Jun 12, 2023Updated 3 years ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
QData / TextAttack
View on GitHub
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs…
☆3,453Apr 17, 2026Updated 3 months ago
paul-rottger / hatecheck-data
View on GitHub
Röttger et al. (ACL 2021): "HateCheck: Functional Tests for Hate Speech Detection Models" - Data
☆59Oct 14, 2025Updated 9 months ago
t-davidson / hate-speech-and-offensive-language
View on GitHub
Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017
☆846Jun 12, 2023Updated 3 years ago
hadarishav / Ruddit
View on GitHub
This repo contains the dataset and description for Ruddit and its variants.
☆36Feb 13, 2022Updated 4 years ago
allenai / real-toxicity-prompts
View on GitHub
☆233Feb 23, 2021Updated 5 years ago
MaartenGr / BERTopic
View on GitHub
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
☆7,754May 13, 2026Updated 2 months ago
aymeam / Datasets-for-Hate-Speech-Detection
View on GitHub
Datasets for Hate Speech Detection
☆139May 12, 2023Updated 3 years ago
leondz / hatespeechdata
View on GitHub
Catalog of abusive language data (PLoS 2020)
☆324Jun 14, 2024Updated 2 years ago
intelligence-csd-auth-gr / Ethos-Hate-Speech-Dataset
View on GitHub
This repository contains a dataset for hate speech detection on social media platforms.
☆76Dec 9, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
surge-ai / toxicity
View on GitHub
The world's largest social media toxicity dataset.
☆192Jun 10, 2022Updated 4 years ago
makcedward / nlpaug
View on GitHub
Data augmentation for NLP
☆4,663Updated this week
huggingface / sentence-transformers
View on GitHub
State-of-the-Art Embeddings, Retrieval, and Reranking
☆18,941Updated this week
Hironsan / HateSonar
View on GitHub
Hate Speech Detection Library for Python.
☆195Oct 26, 2025Updated 8 months ago
LAION-AI / CLIP-based-NSFW-Detector
View on GitHub
☆471May 30, 2023Updated 3 years ago
valeriobasile / hurtlex
View on GitHub
A multilingual lexicon of words to hurt.
☆99Oct 10, 2025Updated 9 months ago
datascisteven / Automated-Hate-Tweet-Detection
View on GitHub
Developing a classification model to detect hate tweets ready for deployment using various NLP techniques
☆19Oct 7, 2024Updated last year
PAIR-code / lit
View on GitHub
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic …
☆3,658Jul 7, 2026Updated 2 weeks ago
dhfbk / twitter-abusive-context-dataset
View on GitHub
☆10Aug 31, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ipavlopoulos / toxic_spans
View on GitHub
Detect toxic spans in toxic texts
☆70Jun 12, 2023Updated 3 years ago
alisawuffles / DExperts
View on GitHub
code associated with ACL 2021 DExperts paper
☆119May 24, 2023Updated 3 years ago
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,777May 26, 2026Updated last month
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated 2 months ago
ELS-RD / transformer-deploy
View on GitHub
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,690Oct 23, 2024Updated last year
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,852Jun 17, 2025Updated last year
conversationai / unintended-ml-bias-analysis
View on GitHub
☆328Feb 25, 2026Updated 4 months ago
vjosapreniqi / MoralBERT
View on GitHub
A tool for detecting moral values in social discourse
☆19Apr 24, 2025Updated last year
argilla-io / argilla
View on GitHub
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
☆5,048Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
GEM-benchmark / NL-Augmenter
View on GitHub
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
☆786May 19, 2024Updated 2 years ago
google / BIG-bench
View on GitHub
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
☆3,250Jul 19, 2024Updated 2 years ago
facebookresearch / anli
View on GitHub
Adversarial Natural Language Inference Benchmark
☆402May 12, 2022Updated 4 years ago
openai / moderation-api-release
View on GitHub
☆160Aug 9, 2022Updated 3 years ago
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,382Oct 27, 2025Updated 8 months ago
abaheti95 / ToxiChat
View on GitHub
Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming so…
☆17Jul 27, 2023Updated 2 years ago
huggingface / datatrove
View on GitHub
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆3,220Updated this week