CornellNLP / ConvoKit
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
☆542Updated last week
Related projects: ⓘ
- analyze text with empath☆311Updated 7 years ago
- Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.☆437Updated 2 years ago
- A dataset containing human-human knowledge-grounded open-domain conversations.☆620Updated last month
- Linguistic Inquiry and Word Count (LIWC) analyzer☆191Updated 2 years ago
- Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenizati…☆660Updated 6 months ago
- Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.☆340Updated last year
- A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…☆1,192Updated 8 months ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆724Updated last month
- A reading list of up-to-date papers on NLP for Social Good.☆276Updated last year
- Large datasets for conversational AI☆1,279Updated 4 years ago
- Topic Modeling in Embedding Spaces☆538Updated 11 months ago
- A module to compute textual lexical richness (aka lexical diversity).☆90Updated last year
- BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)☆574Updated last month
- A Survey and Experiments on Annotated Corpora for Emotion Classification in Text☆222Updated last year
- The Schema-Guided Dialogue Dataset☆539Updated last year
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆685Updated last year
- Catalog of abusive language data (PLoS 2020)☆299Updated 3 months ago
- Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?☆505Updated last year
- High-accuracy NLP parser with models for 11 languages.☆858Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆155Updated 8 months ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,339Updated 3 months ago
- Repository for TweetEval☆354Updated 2 years ago
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆249Updated 2 weeks ago
- Compute Sentence Embeddings Fast!☆619Updated last year
- BookNLP, a natural language processing pipeline for books☆782Updated last month
- A Python library for calculating a large variety of metrics from text☆309Updated this week
- Google USE (Universal Sentence Encoder) for spaCy☆176Updated last year
- Enhanced Subject Word Object Extraction☆148Updated 3 years ago
- Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx☆625Updated 3 years ago
- Hate speech dataset from Stormfront forum manually labelled at sentence level.☆161Updated 4 years ago