nschaetti / SFGram-dataset
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆30Updated 6 years ago
Alternatives and similar repositories for SFGram-dataset:
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆104Updated 6 years ago
- The ScriptBase Corpus☆42Updated 6 years ago
- A corpus of poetry from Project Gutenberg☆194Updated 6 years ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆216Updated last year
- ☆54Updated last year
- a python package for cleaning Gutenberg books and dataset☆33Updated last year
- Finds linguistic patterns effortlessly☆35Updated last year
- A dataset of alignment research and code to reproduce it☆73Updated last year
- A corpus and code for understanding norms and subjectivity. 🤖☆47Updated 4 months ago
- Fine tuning experiments for the GPT-2 model by OpenAI.☆20Updated 5 years ago
- Analysis of gutenberg dataset☆43Updated 6 years ago
- A classifier that distinguishes political from non-political news articles.☆29Updated last year
- Promoting critical thinking through machine-generated prompts.☆18Updated 3 years ago
- Metadata from Project Gutenberg☆41Updated 3 weeks ago
- The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.☆39Updated last year
- Open Access PDF harvester, metadata aggregator and full-text ingester☆59Updated 8 months ago
- ☆29Updated 7 years ago
- Parse Sentences to extract evoked frames.☆10Updated 5 years ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated last year
- One stop shop for all things carp☆59Updated 2 years ago
- Practical Approaches to Data Science with Text☆39Updated 5 years ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- A gathering of digital methods recipes for research, teaching and collaborations from across the Public Data Lab.☆11Updated 11 months ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 2 years ago
- Cleaned up version of the PlotMachines code☆64Updated last year
- ☆32Updated last year
- A deep learning model for extracting references from text☆27Updated last year
- Code for SaGe subword tokenizer (EACL 2023)☆22Updated 2 months ago
- Annotation tool for coreference☆32Updated last year
- Agents that build knowledge graphs and explore textual worlds by asking questions☆79Updated last year