This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
☆346Jun 17, 2024Updated last year
Alternatives and similar repositories for TOXIGEN
Users that are interested in TOXIGEN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository for the Dynamically Generated Hate Speech Dataset by Vidgen et al. (2021).☆46May 26, 2025Updated 11 months ago
- ☆44Jun 29, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆88Mar 2, 2021Updated 5 years ago
- Röttger et al. (ACL 2021): "HateCheck: Functional Tests for Hate Speech Detection Models" - Data☆59Oct 14, 2025Updated 7 months ago
- ☆231Feb 23, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆350Feb 23, 2024Updated 2 years ago
- Code and data for the EMNLP 2021 paper "Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts". Coming so…☆17Jul 27, 2023Updated 2 years ago
- code for our EACL 2021 paper: "Challenges in Automated Debiasing for Toxic Language Detection" by Xuhui Zhou, Maarten Sap, Swabha Swayamd…☆20Aug 20, 2021Updated 4 years ago
- ☆28Feb 27, 2025Updated last year
- Official repository of "HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning", Findings of EMNLP 2023☆28Jan 25, 2024Updated 2 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,841Jun 17, 2025Updated 11 months ago
- Generalizable Implicit Hate Speech Detection using Contrastive Learning (COLING 2022)☆14Oct 9, 2022Updated 3 years ago
- ☆12Oct 23, 2022Updated 3 years ago
- Can we use explanations to improve hate speech models? Our paper accepted at AAAI 2021 tries to explore that question.☆239Jun 12, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Automated Pyramid Summarization Evaluation☆12Jun 2, 2024Updated last year
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,158Feb 27, 2024Updated 2 years ago
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆510Nov 18, 2025Updated 6 months ago
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".☆16Aug 14, 2020Updated 5 years ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆135Mar 1, 2024Updated 2 years ago
- Hate speech detection corpus in Korean, shared with EMNLP 2023 paper☆17Apr 19, 2024Updated 2 years ago
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)☆20Oct 22, 2024Updated last year
- NSMC, KorSTS ... fine-tunings☆18Feb 23, 2022Updated 4 years ago
- Repository for the Bias Benchmark for QA dataset.☆142Jan 8, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- python project template for personal projects! 🙋♀️☆11Nov 28, 2020Updated 5 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆915Jan 16, 2025Updated last year
- ☆27Nov 20, 2023Updated 2 years ago
- Fortifying Toxic Speech Detectors Against Veiled Toxicity☆11Oct 21, 2020Updated 5 years ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆313Sep 16, 2024Updated last year
- Using GPT-3 to detect hate speech that contains sexist and racist content☆24Nov 11, 2025Updated 6 months ago
- ☆14Jan 6, 2025Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆111Mar 8, 2024Updated 2 years ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆582Feb 27, 2026Updated 2 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"☆16Nov 5, 2024Updated last year
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆275May 13, 2024Updated 2 years ago
- 面向中文大模型价值观的评估与对齐研究☆556Jul 20, 2023Updated 2 years ago
- ☆161Aug 9, 2022Updated 3 years ago
- Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.☆14Dec 7, 2023Updated 2 years ago
- Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)☆250Jun 29, 2023Updated 2 years ago
- ☆236Dec 27, 2016Updated 9 years ago