pgcorpus / gutenberg
Pipeline to generate the Standardized Project Gutenberg Corpus
☆171Updated last year
Alternatives and similar repositories for gutenberg:
Users that are interested in gutenberg are comparing it to the libraries listed below
- A module to compute textual lexical richness (aka lexical diversity).☆104Updated last year
- University of Colorado VerbNet☆104Updated 10 months ago
- ☆19Updated 3 years ago
- Linguistic and stylistic complexity measures for (literary) texts☆80Updated last year
- This is a simple Python package for calculating a variety of lexical diversity indices☆73Updated last year
- A minimal, pure Python library to interface with CoNLL-U format files.☆149Updated last year
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆65Updated 2 years ago
- Utility for behavioral and representational analyses of Language Models☆132Updated 2 weeks ago
- Analysis of gutenberg dataset☆44Updated 6 years ago
- Package to extract connotation frames☆83Updated last year
- An initiative to collect and distribute resources for co-reference resolution in a unified standard.☆24Updated 10 months ago
- Dutch coreference resolution & dialogue analysis using deterministic rules☆21Updated last year
- Repository for the Georgetown University Multilayer Corpus (GUM)☆93Updated last week
- Python Finite-State Toolkit☆53Updated last month
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆57Updated last year
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆107Updated 6 years ago
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- ☆34Updated 2 weeks ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆47Updated last year
- The Benchmark of Linguistic Minimal Pairs☆149Updated 2 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆36Updated 5 months ago
- ☆52Updated last year
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆30Updated 3 weeks ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆68Updated 3 years ago
- Natural language processing pipeline for book-length documents (archival Java version; for current Python version, see: https://github.co…☆313Updated 3 years ago
- A multilingual lexicon of words to hurt.☆86Updated 4 months ago
- A Python wrapper around the topic modeling functions of MALLET.☆101Updated 4 months ago
- Tools for compiling corpora from Common Crawl☆13Updated 4 months ago
- Python framework for processing Universal Dependencies data☆55Updated last week
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆37Updated 2 years ago