pgcorpus / gutenbergLinks
Pipeline to generate the Standardized Project Gutenberg Corpus
☆184Updated last year
Alternatives and similar repositories for gutenberg
Users that are interested in gutenberg are comparing it to the libraries listed below
Sorting:
- A module to compute textual lexical richness (aka lexical diversity).☆109Updated last year
- This is a simple Python package for calculating a variety of lexical diversity indices☆77Updated last year
- Analysis of gutenberg dataset☆44Updated 6 years ago
- Utility for behavioral and representational analyses of Language Models☆148Updated last week
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆49Updated 2 years ago
- English Small World of Words SWOWEN-2018☆67Updated 2 years ago
- Python Finite-State Toolkit☆56Updated this week
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 10 months ago
- Unannotated Spanish 3 Billion Words Corpora☆101Updated 2 years ago
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- Easier Automatic Sentence Simplification Evaluation☆162Updated last year
- Linguistic and stylistic complexity measures for (literary) texts☆81Updated last year
- The Benchmark of Linguistic Minimal Pairs☆151Updated 2 years ago
- Package to extract connotation frames☆85Updated last year
- Build a dialog dataset from online books in many languages☆74Updated 2 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆33Updated 3 months ago
- How Contextual are Contextualized Word Representations?☆41Updated 5 years ago
- A Python package for learning, evaluating, annotating, and extracting vector representations of construction grammars☆39Updated 8 months ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆151Updated 2 years ago
- A Python wrapper around the topic modeling functions of MALLET.☆103Updated 7 months ago
- Natural language understanding benchmarks for Norwegian☆14Updated last year
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆66Updated 3 weeks ago
- A simple toolkit for conducting analyses using corpus methods☆25Updated 3 years ago
- The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.☆39Updated last year
- ☆53Updated last year
- An easy-to-use library to extract indices from texts.☆29Updated 3 years ago
- A python true casing utility that restores case information for texts☆89Updated 2 years ago
- ☆44Updated 2 years ago
- ☆191Updated last year
- Unsupervised method for extracting quotation-speaker pairs from large news corpora.☆29Updated 6 years ago