surge-ai / profanity
The world's largest profanity list.
☆214Updated 10 months ago
Alternatives and similar repositories for profanity:
Users that are interested in profanity are comparing it to the libraries listed below
- The world's largest social media toxicity dataset.☆177Updated 2 years ago
- Conversational text Analysis using various NLP techniques☆181Updated last year
- A multilingual lexicon of words to hurt.☆83Updated 3 months ago
- An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo☆275Updated last year
- This repository contains a dataset for hate speech detection on social media platforms.☆70Updated 2 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Build Semantic Search with S-BERT and Fine-tune your model in unsupervised way☆58Updated 2 years ago
- DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.☆108Updated last year
- A very long list of English profanity.☆252Updated 2 months ago
- Lightning Fast Language Prediction 🚀☆165Updated 5 years ago
- Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.☆117Updated 10 months ago
- MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert…☆48Updated 3 years ago
- Catalog of abusive language data (PLoS 2020)☆308Updated 8 months ago
- Code for our WOAH@ACL 2021 Paper on Data Integration for Toxic Comment Classification: Making More Than 40 Datasets Easily Accessible in …☆27Updated 3 years ago
- A python package to simulate typographical errors.☆31Updated last year
- Cleans Reddit Text Data☆81Updated 4 years ago
- [LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweeban…☆104Updated last year
- TimeLMs: Diachronic Language Models from Twitter☆107Updated 11 months ago
- This repository contains the HiNER dataset released with our paper at LREC 2022☆14Updated last year
- ColBERT humor dataset for the task of humor detection, containing 200,000 jokes/news☆70Updated 4 months ago
- Efficient and easy to use transliteration for Indian languages☆51Updated 4 years ago
- Generate reports for spaCy models.☆29Updated 2 years ago
- Sentence transformers models for SpaCy☆107Updated last year
- Information extraction from English and German texts based on predicate logic☆135Updated last year
- A collection of preprocessed datasets and pretrained models for generating paraphrases.☆29Updated 3 years ago
- 🔤 Measure edit distance based on keyboard layout☆59Updated last year
- Passive/Active sentence Transformer☆28Updated 6 years ago
- Hate speech dataset from Stormfront forum manually labelled at sentence level.☆168Updated 4 years ago
- A module to compute textual lexical richness (aka lexical diversity).☆98Updated last year
- ☆42Updated last year