JonathanReeve / chapterize
A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books for computational text analysis.
☆104Updated 6 years ago
Alternatives and similar repositories for chapterize:
Users that are interested in chapterize are comparing it to the libraries listed below
- The official repository for the The Project Dialogism Novel Corpus, a dataset of annotated quotations in full-length English novels.☆39Updated last year
- Package to extract connotation frames☆83Updated last year
- An NLP processing pipeline for characters in fanfiction. Developed by students at Carnegie Mellon University from 2019-2021.☆31Updated 5 months ago
- A Python wrapper around the topic modeling functions of MALLET.☆101Updated 3 months ago
- Repository for code and metadata to support work described in "Authorless Topic Models: Biasing Models Away from Known Structure"☆28Updated 4 years ago
- Latin BERT☆58Updated 7 months ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆167Updated last year
- Preliminary spaCy models for Latin☆14Updated 2 years ago
- High-performance text aligner for large collections of texts☆48Updated 3 months ago
- Analyze Argumentation and Rhetorical Aspects in Scientific Writing.☆19Updated 2 years ago
- A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Sp…☆29Updated 3 years ago
- Digital Humanities Across Borders☆47Updated 10 months ago
- ☆67Updated 11 months ago
- Python Multilingual Ucrel Semantic Analysis System☆31Updated 5 months ago
- ☆30Updated 7 years ago
- Practical Approaches to Data Science with Text☆39Updated 5 years ago
- Python version for Doug Biber's Multidimensional Analysis (MDA)☆29Updated 2 months ago
- ☆27Updated last year
- an experimental implementation of Burrow's delta in Python 3☆20Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- Linguistic and stylistic complexity measures for (literary) texts☆79Updated last year
- Dutch coreference resolution & dialogue analysis using deterministic rules☆21Updated last year
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆126Updated 3 years ago
- Scripts that clean up OCR and munge Hathi metadata.☆76Updated 7 years ago
- Code and data supporting "NovelTM Data Sets for English-Language Fiction."☆23Updated 4 years ago
- Annotation tool for coreference☆32Updated last year
- Explore your own text collection with a topic model – without prior knowledge.☆62Updated last month
- A simple toolkit for conducting analyses using corpus methods☆25Updated 3 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated last year
- A command-line program to download text corpora.☆34Updated 7 years ago