pgcorpus / gutenberg
Pipeline to generate the Standardized Project Gutenberg Corpus
☆167Updated last year
Alternatives and similar repositories for gutenberg:
Users that are interested in gutenberg are comparing it to the libraries listed below
- The Benchmark of Linguistic Minimal Pairs☆148Updated 2 years ago
- Analysis of gutenberg dataset☆43Updated 6 years ago
- Utility for behavioral and representational analyses of Language Models☆128Updated last week
- Linguistic and stylistic complexity measures for (literary) texts☆79Updated last year
- A module to compute textual lexical richness (aka lexical diversity).☆98Updated last year
- Package to extract connotation frames☆83Updated last year
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 6 months ago
- This is a simple Python package for calculating a variety of lexical diversity indices☆71Updated last year
- A multilingual lexicon of words to hurt.☆83Updated 3 months ago
- Python Finite-State Toolkit☆50Updated last month
- A Python wrapper around the topic modeling functions of MALLET.☆101Updated 3 months ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆126Updated 3 years ago
- ☆44Updated 2 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆29Updated 2 months ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆104Updated 6 years ago
- ☆19Updated 3 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated 2 months ago
- Dutch coreference resolution & dialogue analysis using deterministic rules☆21Updated last year
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆8Updated last year
- Automated Semantic Analysis of Discourse Markers☆10Updated 2 years ago
- A repository with several curated datasets of counter-narratives to fight online hate speech.☆88Updated last year
- Natural Language Processing Research in North American Linguistics Departments☆20Updated 3 months ago
- Repository for code and metadata to support work described in "Authorless Topic Models: Biasing Models Away from Known Structure"☆28Updated 4 years ago
- ☆164Updated 2 years ago
- The central repo for Creole based NLU and NLG work☆17Updated 8 months ago
- XED multilingual emotion datasets☆57Updated last year
- This repository houses the IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated se…☆19Updated 3 years ago
- English Small World of Words SWOWEN-2018☆66Updated 2 years ago
- Easier Automatic Sentence Simplification Evaluation☆160Updated last year
- Code and data for the WSDM '21 paper "Quotebank: A Corpus of Quotations from a Decade of News"☆19Updated 3 years ago