nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆32Updated 7 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- Parse Sentences to extract evoked frames.☆10Updated 6 years ago
- ☆62Updated 2 years ago
- The AI Knowledge Editor☆186Updated 3 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆114Updated 7 years ago
- A large scale Humor Dataset, containing more than 550k rated English jokes (LREC'20)☆71Updated 2 years ago
- Code for constructing TLDR corpus from Reddit dataset☆27Updated 4 years ago
- ☆24Updated last year
- Training & Implementation of chatbots leveraging GPT-like architecture with the aitextgen package to enable dynamic conversations.☆49Updated 3 years ago
- Corpus exploration platform using advanced tools such as interactive summarization and multi document coreference resolution☆12Updated 2 years ago
- Analysis of gutenberg dataset☆45Updated 6 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆203Updated last year
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆34Updated 2 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆52Updated last year
- Code for Stage-wise Fine-tuning for Graph-to-Text Generation☆26Updated 2 years ago
- ☆195Updated last year
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆45Updated 5 years ago
- One stop shop for all things carp☆59Updated 3 years ago
- Libraries, Archives and Museums (LAM)☆88Updated 3 years ago
- ☆100Updated last year
- A dataset for pretraining language models targeted for legal tasks.☆140Updated 3 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 4 years ago
- The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.☆37Updated 3 years ago
- Evaluation suite for large-scale language models.☆129Updated 4 years ago
- Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids☆18Updated 9 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆67Updated 2 years ago
- A question-answering dataset with a focus on subjective information☆48Updated last year
- Repo for the LREC 2022 paper The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts.☆14Updated 3 years ago
- A corpus of poetry from Project Gutenberg☆210Updated 7 years ago
- Get answers to research questions from 200M+ papers. Link to demo -☆207Updated last month
- Repo for the paper "Detecting Logical Fallacies: From Quiz to Climate Change News" (2021)☆84Updated 2 years ago