nschaetti / SFGram-datasetLinks
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published o…
☆32Updated 7 years ago
Alternatives and similar repositories for SFGram-dataset
Users that are interested in SFGram-dataset are comparing it to the libraries listed below
Sorting:
- ☆62Updated 2 years ago
- Weird A.I. Yankovic neural-net based lyrics parody generator☆84Updated 3 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆114Updated 7 years ago
- Metadata from Project Gutenberg☆41Updated this week
- A large scale Humor Dataset, containing more than 550k rated English jokes (LREC'20)☆72Updated 2 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆52Updated last year
- Code for the paper: "Large Language Models as Corporate Lobbyists" (2023).☆171Updated 2 years ago
- German GPT-2 model☆32Updated 4 years ago
- A corpus of poetry from Project Gutenberg☆210Updated 7 years ago
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆206Updated 2 years ago
- ☆24Updated last year
- Libraries, Archives and Museums (LAM)☆88Updated 3 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 3 years ago
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆40Updated 6 years ago
- A dataset of alignment research and code to reproduce it☆78Updated 2 years ago
- One stop shop for all things carp☆59Updated 3 years ago
- ☆195Updated last year
- Repository for Zheng and Guha et al., 2021, "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Data…☆95Updated 2 years ago
- A demonstration of how a toy (but usable!) semantic search engine can be quickly built using Cohere's platform.☆117Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 4 years ago
- YT_subtitles - extracts subtitles from YouTube videos to raw text for Language Model training☆45Updated 5 years ago
- Logical Operations On Puzzles: Simple Iterative Reasoning Tests for LLMs first through wordgrids☆18Updated 10 months ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆224Updated 2 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated 8 months ago
- A dataset for pretraining language models targeted for legal tasks.☆140Updated 3 years ago
- ☆44Updated 3 years ago
- human_detectors hosts the data released from the paper "People who frequently use ChatGPT for writing tasks are accurate and robust detec…☆43Updated 8 months ago
- GPT-4 Passes the Bar☆28Updated 2 years ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆59Updated 3 years ago