nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆33Updated 7 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- ☆64Updated 2 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 4 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆207Updated 2 years ago
- A corpus of poetry from Project Gutenberg☆212Updated 7 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆114Updated 7 years ago
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- Analysis of gutenberg dataset☆44Updated 7 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Updated last year
- Libraries, Archives and Museums (LAM)☆88Updated 3 years ago
- human_detectors hosts the data released from the paper "People who frequently use ChatGPT for writing tasks are accurate and robust detec…☆44Updated 8 months ago
- A dataset of alignment research and code to reproduce it☆78Updated 2 years ago
- Parse Sentences to extract evoked frames.☆10Updated 6 years ago
- Factored Cognition Primer: How to write compositional language model programs☆50Updated 2 years ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆34Updated 2 years ago
- One stop shop for all things carp☆59Updated 3 years ago
- Dataset accompanying the paper "Investigating African-American Vernacular English in Transformer-Based Text Generation."☆10Updated 3 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated 8 months ago
- Code for constructing TLDR corpus from Reddit dataset☆27Updated 4 years ago
- Agents that build knowledge graphs and explore textual worlds by asking questions☆79Updated 2 years ago
- ☆17Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 4 years ago
- The AI Knowledge Editor☆184Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆40Updated 6 years ago
- This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.☆127Updated last year
- Documentation effort for the BookCorpus dataset☆34Updated 4 years ago
- ☆24Updated last year
- ☆196Updated last year
- The repo containing the Critical Role Dungeons and Dragons Dataset.☆143Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆45Updated 5 years ago
- A simple interface to the Project Gutenberg corpus.☆331Updated 3 years ago