pgcorpus / gutenberg
Pipeline to generate the Standardized Project Gutenberg Corpus
☆167Updated last year
Alternatives and similar repositories for gutenberg:
Users that are interested in gutenberg are comparing it to the libraries listed below
- A module to compute textual lexical richness (aka lexical diversity).☆98Updated last year
- Utility for behavioral and representational analyses of Language Models☆127Updated last month
- This is a simple Python package for calculating a variety of lexical diversity indices☆67Updated last year
- ☆44Updated 2 years ago
- The Benchmark of Linguistic Minimal Pairs☆144Updated 2 years ago
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 5 months ago
- Package to extract connotation frames☆81Updated last year
- ☆199Updated last week
- Analysis of gutenberg dataset☆42Updated 6 years ago
- The ScriptBase Corpus☆42Updated 6 years ago
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆68Updated 3 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆28Updated last month
- python package to read and write CLDF datasets☆15Updated this week
- Easier Automatic Sentence Simplification Evaluation☆160Updated last year
- This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs☆173Updated last year
- Linguistic and stylistic complexity measures for (literary) texts☆79Updated 11 months ago
- Contextualised Word Representations for Lexical Semantic Change Analysis☆31Updated 4 years ago
- A Python wrapper around the topic modeling functions of MALLET.☆102Updated 2 months ago
- English Small World of Words SWOWEN-2018☆66Updated 2 years ago
- STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)☆64Updated last year
- Natural Language Processing Research in North American Linguistics Departments☆19Updated 2 months ago
- ☆163Updated 2 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆45Updated 3 months ago
- An initiative to collect and distribute resources for co-reference resolution in a unified standard.☆24Updated 8 months ago
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆57Updated last year
- University of Colorado VerbNet☆103Updated 7 months ago
- Code and data for the WSDM '21 paper "Quotebank: A Corpus of Quotations from a Decade of News"☆19Updated 3 years ago
- A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.☆25Updated 4 months ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆45Updated last year
- SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages☆8Updated 11 months ago