CornellNLP / ConvoKit
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
☆554Updated this week
Related projects ⓘ
Alternatives and complementary repositories for ConvoKit
- Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenizati…☆661Updated 8 months ago
- analyze text with empath☆315Updated 7 years ago
- Catalog of abusive language data (PLoS 2020)☆304Updated 5 months ago
- Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.☆342Updated last year
- A dataset containing human-human knowledge-grounded open-domain conversations.☆632Updated 3 months ago
- A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coher…☆1,203Updated 10 months ago
- Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.☆451Updated 2 years ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆725Updated 3 months ago
- A sentence segmenter that actually works!☆302Updated 4 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆158Updated 10 months ago
- High-accuracy NLP parser with models for 11 languages.☆871Updated 2 years ago
- Officially supported AllenNLP models☆528Updated last year
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆249Updated 2 months ago
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆312Updated last month
- BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)☆575Updated 3 months ago
- Linguistic Inquiry and Word Count (LIWC) analyzer☆193Updated 2 years ago
- Topic Modeling in Embedding Spaces☆546Updated last year
- A Survey and Experiments on Annotated Corpora for Emotion Classification in Text☆225Updated last year
- 📗 Score text readability using a number of formulas: Flesch-Kincaid Grade Level, Gunning Fog, ARI, Dale Chall, SMOG, and more☆361Updated 2 months ago
- Mining individual characters in multiparty dialogue☆164Updated last year
- 📃Language Model based sentences scoring library☆303Updated 2 years ago
- BLEURT is a metric for Natural Language Generation based on transfer learning.☆697Updated last year
- ☆226Updated 7 years ago
- This repository contains EmoBank, a large-scale text corpus manually annotated with emotion according to the psychological Valence-Arousa…☆195Updated last year
- Stanford's Alexa Prize socialbot☆131Updated last year
- Implementation of the ClausIE information extraction system for python+spacy☆220Updated 2 years ago
- Datasets for Hate Speech Detection☆115Updated last year
- Switchboard Dialog Act Corpus with Penn Treebank links☆139Updated 3 years ago
- Large datasets for conversational AI☆1,294Updated 5 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆488Updated 5 months ago