surge-ai / profanity

The world's largest profanity list.

☆214

Alternatives and similar repositories for profanity:

Users that are interested in profanity are comparing it to the libraries listed below

surge-ai / toxicity
The world's largest social media toxicity dataset.
☆177Updated 2 years ago
maxent-ai / converse
Conversational text Analysis using various NLP techniques
☆181Updated last year
valeriobasile / hurtlex
A multilingual lexicon of words to hurt.
☆83Updated 3 months ago
Yale-LILY / SummerTime
An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo
☆275Updated last year
intelligence-csd-auth-gr / Ethos-Hate-Speech-Dataset
This repository contains a dataset for hate speech detection on social media platforms.
☆70Updated 2 years ago
maxent-ai / zeroshot_topics
Topic Inference with Zeroshot models
☆61Updated last year
99sbr / semantic-search-with-sbert
Build Semantic Search with S-BERT and Fine-tune your model in unsupervised way
☆58Updated 2 years ago
hate-alert / DE-LIMIT
DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.
☆108Updated last year
zacanger / profane-words
A very long list of English profanity.
☆252Updated 2 months ago
indix / whatthelang
Lightning Fast Language Prediction 🚀
☆165Updated 5 years ago
KennethEnevoldsen / asent
Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.
☆117Updated 10 months ago
Kvasirs / MILES
MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…
☆48Updated 3 years ago
leondz / hatespeechdata
Catalog of abusive language data (PLoS 2020)
☆308Updated 8 months ago
julian-risch / toxic-comment-collection
Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …
☆27Updated 3 years ago
ranvijaykumar / typo
A python package to simulate typographical errors.
☆31Updated last year
LoLei / redditcleaner
Cleans Reddit Text Data
☆81Updated 4 years ago
mit-ccc / TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweeban…
☆104Updated last year
cardiffnlp / timelms
TimeLMs: Diachronic Language Models from Twitter
☆107Updated 11 months ago
cfiltnlp / HiNER
This repository contains the HiNER dataset released with our paper at LREC 2022
☆14Updated last year
Moradnejad / ColBERT-Using-BERT-Sentence-Embedding-for-Humor-Detection
ColBERT humor dataset for the task of humor detection, containing 200,000 jokes/news
☆70Updated 4 months ago
notAI-tech / DeepTranslit
Efficient and easy to use transliteration for Indian languages
☆51Updated 4 years ago
koaning / spacy-report
Generate reports for spaCy models.
☆29Updated 2 years ago
MartinoMensio / spacy-sentence-bert
Sentence transformers models for SpaCy
☆107Updated last year
richardpaulhudson / holmes-extractor
Information extraction from English and German texts based on predicate logic
☆135Updated last year
hetpandya / paraphrase-datasets-pretrained-models
A collection of preprocessed datasets and pretrained models for generating paraphrases.
☆29Updated 3 years ago
MaxHalford / clavier
🔤 Measure edit distance based on keyboard layout
☆59Updated last year
DanManN / pass2act
Passive/Active sentence Transformer
☆28Updated 6 years ago
Vicomtech / hate-speech-dataset
Hate speech dataset from Stormfront forum manually labelled at sentence level.
☆168Updated 4 years ago
LSYS / LexicalRichness
A module to compute textual lexical richness (aka lexical diversity).
☆98Updated last year
pmbaumgartner / setfit
☆42Updated last year