nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆32Updated 6 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- a python package for cleaning Gutenberg books and dataset☆34Updated 2 months ago
- A corpus and code for understanding norms and subjectivity. 🤖☆50Updated 9 months ago
- Libraries, Archives and Museums (LAM)☆84Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆189Updated last year
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- Code for constructing TLDR corpus from Reddit dataset☆25Updated 3 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆109Updated 6 years ago
- Releases for the reddit-graph project☆18Updated last year
- Finds linguistic patterns effortlessly☆37Updated last year
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 3 years ago
- ☆60Updated 2 years ago
- MinScIE is an Open Information Extraction system which provides structured knowledge enriched with semantic information about citations.☆15Updated 6 years ago
- Analysis of gutenberg dataset☆45Updated 6 years ago
- Socrates is a thin wrapper around an early-stage [AllenNLP](https://allennlp.org/) model that enables machine reading comprehension (MRC)…☆14Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆25Updated 2 years ago
- Agents that build knowledge graphs and explore textual worlds by asking questions☆79Updated last year
- I.PHI dataset generation☆25Updated last year
- Unreliable News Index (for Columbia Journalism Review)☆56Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 6 years ago
- MultiCite code and data. Models are available on Huggingface.☆31Updated 3 years ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆34Updated 2 years ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆61Updated last year
- A highly sophisticated sequence-to-sequence model for code generation☆40Updated 4 years ago
- ☆34Updated 2 years ago
- A collection of open source tools and resources related to Wikibase knowledge graphs☆72Updated last year
- Code for generating Quasimodo, a commonsense knowledge base.☆20Updated 3 years ago
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 8 years ago
- ☆24Updated 10 months ago
- The ScriptBase Corpus☆44Updated 7 years ago
- an experimental implementation of Burrow's delta in Python 3☆21Updated 3 years ago