surge-ai / profanity
The world's largest profanity list.
☆212Updated 9 months ago
Alternatives and similar repositories for profanity:
Users that are interested in profanity are comparing it to the libraries listed below
- The world's largest social media toxicity dataset.☆178Updated 2 years ago
- Conversational text Analysis using various NLP techniques☆179Updated last year
- Convert Wikipedia database dumps into plaintext files☆311Updated 3 years ago
- Question-answers, collected from Google☆125Updated 3 years ago
- DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.☆108Updated last year
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆86Updated last week
- Cleans Reddit Text Data☆81Updated 4 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 2 years ago
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆48Updated 3 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆32Updated 3 years ago
- This repository contains papers and resources pertaining to Hate speech research.☆43Updated 3 years ago
- NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT☆231Updated last year
- Passive/Active sentence Transformer☆28Updated 6 years ago
- Testing and training detection models for emoji-based hate speech.☆23Updated 2 years ago
- Code for obtaining the Curation Corpus abstractive text summarisation dataset☆125Updated 4 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆27Updated 3 years ago
- Röttger et al. (ACL 2021): "HateCheck: Functional Tests for Hate Speech Detection Models" - Data☆57Updated 3 years ago
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆29Updated 3 years ago
- AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages☆66Updated 2 years ago
- Google's Meena transformer chatbot implementation☆105Updated 3 years ago
- a bot that generates realistic replies using a combination of pretrained GPT-2 and BERT models☆192Updated 4 years ago
- This repository contains the HiNER dataset released with our paper at LREC 2022☆15Updated last year
- An on-going dataset consisting of hashtags, n-gram counts and other misc NLP things for covid-19 analysis, stemming from over 100 000 000…☆57Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆167Updated last year
- Download subreddit comments☆93Updated 2 years ago
- A module to compute textual lexical richness (aka lexical diversity).☆98Updated last year
- This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences fro…☆159Updated 3 months ago
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 2 years ago