nschaetti / SFGram-datasetLinks

SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…

☆31

Alternatives and similar repositories for SFGram-dataset

Users that are interested in SFGram-dataset are comparing it to the libraries listed below

Sorting:

markriedl / weirdai
Weird A.I. Yankovic neural-net based lyrics parody generator
☆84Updated 3 years ago
dhruvilgala / tvtropes
☆59Updated 2 years ago
julianbrooke / GutenTag
☆30Updated 8 years ago
allenai / dream
☆24Updated 9 months ago
kiasar / gutenberg_cleaner
a python package for cleaning Gutenberg books and dataset
☆34Updated last month
aparrish / gutenberg-poetry-corpus
A corpus of poetry from Project Gutenberg
☆203Updated 6 years ago
EleutherAI / magiCARP
One stop shop for all things carp
☆59Updated 2 years ago
gsatallion8 / Framenet-Frame-Parser
Parse Sentences to extract evoked frames.
☆10Updated 6 years ago
anlausch / ArguminSci
Analyze Argumentation and Rhetorical Aspects in Scientific Writing.
☆19Updated 2 years ago
hrashkin / plotmachines
Cleaned up version of the PlotMachines code
☆66Updated 2 years ago
aparrish / gutenberg-dammit
I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this
☆223Updated 2 years ago
JonathanReeve / chapterize
A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…
☆108Updated 6 years ago
revuel / PatternOmatic
Finds linguistic patterns effortlessly
☆36Updated last year
MeLeLBGU / SaGe
Code for SaGe subword tokenizer (EACL 2023)
☆25Updated 6 months ago
tedunderwood / fictional-time-with-GPT4
An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.
☆33Updated 2 years ago
pgcorpus / gutenberg
Pipeline to generate the Standardized Project Gutenberg Corpus
☆184Updated last year
bigscience-workshop / lam
Libraries, Archives and Museums (LAM)
☆84Updated 2 years ago
cophi-wue / pydelta
an experimental implementation of Burrow's delta in Python 3
☆21Updated 3 years ago
michaelmilleryoder / fanfiction-nlp
An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University from 2019-2021.
☆33Updated 9 months ago
EdinburghNLP / scriptbase
The ScriptBase Corpus
☆44Updated 7 years ago
sdtblck / youtube_subtitle_dataset
YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training
☆43Updated 4 years ago
jsvine / buzzfeed-news-trending-strip
Dataset: BuzzFeed News “Trending” Strip, 2018–2023
☆19Updated 2 years ago
ricsonc / transformers-play-chess
a writeup on some experiments on a sequence model for chess games
☆30Updated 3 years ago
pgcorpus / gutenberg-analysis
Analysis of gutenberg dataset
☆44Updated 6 years ago
oughtinc / primer
Factored Cognition Primer: How to write compositional language model programs
☆49Updated 2 years ago
allenai / scruples
A corpus and code for understanding norms and subjectivity. 🤖
☆50Updated 8 months ago
jeffbinder / visions-and-revisions
Neural network poetry rewriter
☆21Updated 3 years ago
jbesomi / fastlaw
Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.
☆38Updated 6 years ago
aparrish / corpus-driven-narrative-generation
Thoughts toward and tutorial on corpus-driven narrative generation
☆24Updated 4 years ago
VE-FORBRYDERNE / mesh-transformer-jax
Fork of kingoflolz/mesh-transformer-jax with memory usage optimizations and support for GPT-Neo, GPT-NeoX, BLOOM, OPT and fairseq dense L…
☆22Updated 2 years ago