nschaetti / SFGram-dataset
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆30Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for SFGram-dataset
- Analysis of gutenberg dataset☆40Updated 5 years ago
- ☆50Updated last year
- The ScriptBase Corpus☆42Updated 6 years ago
- Materials for PyCon 2020 Workshop, "Nonsense verse... with Python and machine learning"☆30Updated last year
- Finds linguistic patterns effortlessly☆33Updated last year
- A corpus and code for understanding norms and subjectivity. 🤖☆45Updated last month
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆158Updated 10 months ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- Generates free or fixed verse poetry from any text corpus using Ngram natural language generator (markov chains) + pos tagging + rhyme id…☆28Updated 10 years ago
- Promoting critical thinking through machine-generated prompts.☆18Updated 3 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- a python package for cleaning Gutenberg books and dataset☆32Updated last year
- Scraper for downloading the entire ebooks repository of project Gutenberg☆135Updated 2 weeks ago
- PoKi: A Large Dataset of Poems by Children☆34Updated 4 years ago
- Generating Interactive Fiction worlds from story plots☆74Updated last year
- Stylometry library for Burrows' Delta method☆33Updated 6 months ago
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Updated 5 years ago
- Poetic processing, for Python.☆38Updated 6 months ago
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 2 years ago
- I.PHI dataset generation☆25Updated 11 months ago
- Dataset accompanying the paper "Investigating African-American Vernacular English in Transformer-Based Text Generation."☆9Updated 2 years ago
- Agents that build knowledge graphs and explore textual worlds by asking questions☆77Updated last year
- Gutenberg cache and query library☆36Updated 3 months ago
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- Lexicons for the Multilingual UCREL Semantic Analysis System☆39Updated last year
- List of corpora annotated for coreference for different languages☆17Updated 3 months ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated last year
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆21Updated 2 years ago
- This repository includes the code for neural DRS parsing☆27Updated last year