nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆33Updated 7 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆114Updated 7 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Updated last year
- Pipeline to generate the Standardized Project Gutenberg Corpus☆208Updated 2 years ago
- ☆64Updated 2 years ago
- Parse Sentences to extract evoked frames.☆10Updated 6 years ago
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- Analysis of gutenberg dataset☆44Updated 7 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated 9 months ago
- ☆197Updated last year
- ☆24Updated last year
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Updated 6 years ago
- Agents that build knowledge graphs and explore textual worlds by asking questions☆79Updated 2 years ago
- Releases for the reddit-graph project☆18Updated last year
- Get answers to research questions from 200M+ papers. Link to demo -☆208Updated 3 months ago
- A corpus of poetry from Project Gutenberg☆212Updated 7 years ago
- Frame Semantic Parser based on T5 and FrameNet☆64Updated 2 years ago
- Adversarial Training on Transformer Networks to discover check-worthy factual claims☆84Updated 2 years ago
- Repo for the paper "Detecting Logical Fallacies: From Quiz to Climate Change News" (2021)☆84Updated 2 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 4 years ago
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.☆37Updated 4 years ago
- Code for Stage-wise Fine-tuning for Graph-to-Text Generation☆26Updated 3 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆40Updated 6 years ago
- A large scale Humor Dataset, containing more than 550k rated English jokes (LREC'20)☆74Updated 2 years ago
- Semantically Structured Sentence Embeddings☆71Updated last year
- The ScriptBase Corpus☆47Updated 7 years ago
- Human-free quality estimation of document summaries☆97Updated 2 months ago
- A question-answering dataset with a focus on subjective information☆48Updated 2 years ago
- Code repo for "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" at the (CAI2) wor…☆214Updated 2 years ago
- [LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweeban…☆105Updated 2 years ago