pgcorpus / gutenberg-analysis
Analysis of gutenberg dataset
☆40Updated 5 years ago
Related projects: ⓘ
- An implementation of GrASP (Shnarch et. al., 2017)☆21Updated 2 years ago
- Align the token outputs from Spacy and Huggingface to help understand what language structures transformers see☆44Updated 2 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆72Updated 2 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆35Updated 2 years ago
- ☆17Updated last year
- Bayesian Assessment of Hypotheses☆24Updated last year
- ParaNames: A multilingual resource for parallel names☆30Updated 4 months ago
- An example of how to use spaCy for extremely large files without running into memory issues☆36Updated 2 years ago
- ☆64Updated last year
- ☆19Updated 2 years ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated last year
- This repository contains papers and resources pertaining to Hate speech research.☆42Updated 3 years ago
- Finds linguistic patterns effortlessly☆31Updated last year
- The Universal Decompositional Semantics (UDS) dataset and the Decomp toolkit☆55Updated last year
- ☆17Updated last year
- Linguistic and stylistic complexity measures for (literary) texts☆76Updated 7 months ago
- a python package for cleaning Gutenberg books and dataset☆30Updated last year
- This is a simple Python package for calculating a variety of lexical diversity indices☆66Updated last year
- A Word Sense Disambiguation system integrating implicit and explicit external knowledge.☆66Updated 3 years ago
- MultiCite code and data. Models are available on Huggingface.☆28Updated 2 years ago
- Featurize words into orthographic and phonological vectors.☆39Updated last year
- classy is a simple-to-use library for building high-performance Machine Learning models in NLP.☆85Updated 2 months ago
- ☆54Updated 2 years ago
- Statistics on multilingual datasets☆17Updated 2 years ago
- Repo for the LREC 2022 paper The Project Dialogism Novel Corpus: A Dataset for Quotation Attribution in Literary Texts.☆13Updated 2 years ago
- CrowdTruth framework for crowdsourcing ground truth for training & evaluation of AI systems☆57Updated 5 months ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆155Updated 8 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆27Updated last year
- REMERGE - Multi-Word Expression discovery algorithm☆14Updated last year
- How Contextual are Contextualized Word Representations?☆39Updated 4 years ago