nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆31Updated 6 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 3 years ago
- ☆59Updated 2 years ago
- ☆30Updated 8 years ago
- ☆24Updated 9 months ago
- a python package for cleaning Gutenberg books and dataset☆34Updated last month
- A corpus of poetry from Project Gutenberg☆203Updated 6 years ago
- One stop shop for all things carp☆59Updated 2 years ago
- Parse Sentences to extract evoked frames.☆10Updated 6 years ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- Cleaned up version of the PlotMachines code☆66Updated 2 years ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆223Updated 2 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆108Updated 6 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆25Updated 6 months ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆33Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆184Updated last year
- Libraries, Archives and Museums (LAM)☆84Updated 2 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago
- An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University from 2019-2021.☆33Updated 9 months ago
- The ScriptBase Corpus☆44Updated 7 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆43Updated 4 years ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated 2 years ago
- a writeup on some experiments on a sequence model for chess games☆30Updated 3 years ago
- Analysis of gutenberg dataset☆44Updated 6 years ago
- Factored Cognition Primer: How to write compositional language model programs☆49Updated 2 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆50Updated 8 months ago
- Neural network poetry rewriter☆21Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 6 years ago
- Thoughts toward and tutorial on corpus-driven narrative generation☆24Updated 4 years ago
- Fork of kingoflolz/mesh-transformer-jax with memory usage optimizations and support for GPT-Neo, GPT-NeoX, BLOOM, OPT and fairseq dense L…☆22Updated 2 years ago