c-w / gutenbergLinks
A simple interface to the Project Gutenberg corpus.
☆331Updated 3 years ago
Alternatives and similar repositories for gutenberg
Users that are interested in gutenberg are comparing it to the libraries listed below
Sorting:
- Natural language processing pipeline for book-length documents (archival Java version; for current Python version, see: https://github.co…☆316Updated 4 years ago
- Python scripts for retrieving CSV data from the Google Ngram Viewer and plotting it in XKCD style. The Python script for retrieving ngram…☆254Updated 5 years ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆225Updated 2 years ago
- A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.☆319Updated 8 years ago
- A simple interface for the CMU pronouncing dictionary☆318Updated last year
- A command-line program to download text corpora.☆34Updated 8 years ago
- A corpus of poetry from Project Gutenberg☆212Updated 7 years ago
- A HTTP interface to the Project Gutenberg corpus.☆77Updated 6 years ago
- Analyse rhyme scheme, metre and form of poems☆132Updated 4 years ago
- The Art of Literary Text Analysis☆168Updated 6 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated 9 months ago
- a collection of functions that measure the readability of a given body of text☆196Updated 8 years ago
- A simple Python interface for Darius Kazemi's Corpora Project.☆121Updated 6 years ago
- A Python module for interfacing with the Treetagger by Helmut Schmid.☆76Updated 8 months ago
- ☆210Updated 4 years ago
- ☆31Updated 8 years ago
- Collection of tools for building diachronic/historical word vectors☆445Updated 2 years ago
- PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, an…☆477Updated 2 years ago
- ☆98Updated 4 years ago
- Python package for stylometry☆64Updated 4 years ago
- System for building, visualizing, and working with LDA topic models☆97Updated 2 weeks ago
- A textual corpus database for the digital humanities.☆63Updated 5 years ago
- NLTK Contrib☆169Updated last year
- ☆34Updated 4 years ago
- A toolkit for corpus linguistics☆206Updated 6 years ago
- Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models☆51Updated 8 years ago
- A point-and-click tool for creating and analyzing topic models produced by MALLET.☆113Updated 4 years ago
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆114Updated 7 years ago
- Practical Approaches to Data Science with Text☆39Updated 6 years ago
- Quickly extract multi-word phrases from a corpus☆195Updated 5 years ago