pgcorpus / gutenberg
Pipeline to generate the Standardized Project Gutenberg Corpus
☆159Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for gutenberg
- Utility for behavioral and representational analyses of Language Models☆122Updated this week
- Natural Language Processing Research in North American Linguistics Departments☆18Updated 2 weeks ago
- A module to compute textual lexical richness (aka lexical diversity).☆93Updated last year
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- Analysis of gutenberg dataset☆40Updated 5 years ago
- The Benchmark of Linguistic Minimal Pairs☆142Updated last year
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆63Updated last year
- English Small World of Words SWOWEN-2018☆66Updated 2 years ago
- Linguistic and stylistic complexity measures for (literary) texts☆77Updated 10 months ago
- ☆11Updated 8 months ago
- Mining individual characters in multiparty dialogue☆165Updated last year
- An initiative to collect and distribute resources for co-reference resolution in a unified standard.☆24Updated 6 months ago
- ☆44Updated 2 years ago
- This is a simple Python package for calculating a variety of lexical diversity indices☆65Updated last year
- Python Finite-State Toolkit☆45Updated 2 weeks ago
- ☆19Updated 3 years ago
- ☆32Updated last week
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆7Updated 9 months ago
- The ScriptBase Corpus☆42Updated 6 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆27Updated 5 months ago
- Switchboard Dialog Act Corpus with Penn Treebank links☆139Updated 3 years ago
- Python Multilingual Ucrel Semantic Analysis System☆30Updated 3 months ago
- A Python wrapper around the topic modeling functions of MALLET.☆99Updated 3 weeks ago
- Annotated dataset of 100 works of fiction to support tasks in natural language processing and the computational humanities.☆343Updated last year
- Diagnostic tests for linguistic capacities in language models☆66Updated 2 years ago
- Easier Automatic Sentence Simplification Evaluation☆159Updated last year
- Official repository for Semlink resources☆32Updated 2 years ago
- CD20200004 from 01/01/2021 to 31/12/2023 - LIG UGA - Python Notebook and Models for the MT Lab @ ALPS 2022☆14Updated 7 months ago
- This repository houses the IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated se…☆19Updated 3 years ago
- Source codes for the paper "Examining the Ordering of Rhetorical Strategies in Persuasive Requests"☆17Updated 3 years ago