JonathanReeve / chapterize
A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books for computational text analysis.
☆89Updated 6 years ago
Related projects: ⓘ
- The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.☆38Updated 11 months ago
- Python Multilingual Ucrel Semantic Analysis System☆29Updated last month
- Repository for code and metadata to support work described in "Authorless Topic Models: Biasing Models Away from Known Structure"☆27Updated 4 years ago
- [LREC 2020] EtymDB, an Etymological DataBase (v2.1)☆21Updated 2 years ago
- a python package for cleaning Gutenberg books and dataset☆30Updated last year
- Latin BERT☆56Updated 2 months ago
- Analysis of gutenberg dataset☆40Updated 5 years ago
- Digital Humanities Across Borders☆46Updated 5 months ago
- Package to extract connotation frames☆78Updated 9 months ago
- Preliminary spaCy models for Latin☆14Updated last year
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆124Updated 3 years ago
- ☆27Updated 7 years ago
- I.PHI dataset generation☆25Updated 9 months ago
- Poetic processing, for Python.☆36Updated 4 months ago
- Annotation tool for coreference☆31Updated last year
- A command-line program to download text corpora.☆33Updated 7 years ago
- Python API to access glottolog/glottolog☆27Updated 6 months ago
- linguistics backend☆40Updated last year
- Linguistic and stylistic complexity measures for (literary) texts☆76Updated 7 months ago
- A Python wrapper around the topic modeling functions of MALLET.☆99Updated 2 years ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆60Updated this week
- Linguistic Analysis Command-Line Tool☆14Updated 4 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆155Updated 8 months ago
- An experiment replicating part of "Why Literary Time is Measured in Minutes" with GPT-4.☆32Updated last year
- Practical Approaches to Data Science with Text☆38Updated 4 years ago
- Project on the history of genre.☆22Updated 4 years ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated last year
- Workshop materials for our DH2018 workshop on word vectors. Created by Eun Seo Jo, Javier de la Rosa, and Scott Bailey☆15Updated 6 years ago
- https://sites.google.com/site/multidimensionaltagger☆26Updated 9 months ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆27Updated 3 months ago