pgcorpus / gutenbergLinks
Pipeline to generate the Standardized Project Gutenberg Corpus
☆184Updated last year
Alternatives and similar repositories for gutenberg
Users that are interested in gutenberg are comparing it to the libraries listed below
Sorting:
- A module to compute textual lexical richness (aka lexical diversity).☆108Updated last year
- Linguistic and stylistic complexity measures for (literary) texts☆81Updated last year
- Utility for behavioral and representational analyses of Language Models☆146Updated last week
- Package to extract connotation frames☆85Updated last year
- Analysis of gutenberg dataset☆44Updated 6 years ago
- Dutch coreference resolution & dialogue analysis using deterministic rules☆21Updated 2 years ago
- ☆35Updated last week
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆127Updated 3 years ago
- The Benchmark of Linguistic Minimal Pairs☆150Updated 2 years ago
- A Python wrapper around the topic modeling functions of MALLET.☆102Updated 7 months ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆151Updated last year
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆31Updated 3 months ago
- This is a simple Python package for calculating a variety of lexical diversity indices☆77Updated last year
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- ☆12Updated last year
- A corpus and code for understanding norms and subjectivity. 🤖☆49Updated 8 months ago
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 9 months ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- An easy-to-use library to extract indices from texts.☆29Updated 3 years ago
- ☆53Updated last year
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆57Updated 2 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆101Updated last year
- Wikipedia text corpus for self-supervised NLP model training☆44Updated 2 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆108Updated 6 years ago
- Searching in-memory corpus with Corpus Query Language (CQL)☆19Updated 6 months ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆80Updated 11 months ago
- Data for the HIPE 2022 shared task.☆18Updated last year
- Data, codebook, and models to automatically detect storytelling.☆19Updated last month
- 📃Language Model based sentences scoring library☆308Updated 3 years ago
- MAGPIE: A sense-annotated corpus of potentially idiomatic expressions☆27Updated 5 years ago