openzim / gutenbergLinks
Scraper for downloading the entire ebooks repository of project Gutenberg
☆150Updated last week
Alternatives and similar repositories for gutenberg
Users that are interested in gutenberg are comparing it to the libraries listed below
Sorting:
- A simple interface to the Project Gutenberg corpus.☆328Updated 2 years ago
- A toolchain of tasks for sequencing and fingerprinting book fulltext☆45Updated 10 months ago
- Documentation for the GITenberg books project☆29Updated 6 years ago
- I wanted all of plaintext Project Gutenberg in an easy-to-use format, so I made this☆223Updated 2 years ago
- Pages repo☆89Updated 3 years ago
- A tool for analyzing the word histories of a text.☆34Updated 7 months ago
- Grabbing all news.☆62Updated 5 years ago
- A Wikimedia Toolforge tool for exporting ebooks from Wikisources.☆83Updated last week
- Archive.org OPDS Bookserver - A standard for digital book distribution☆130Updated 6 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆35Updated 2 months ago
- Interactive visualization of Wiktionary words and etymologies.☆93Updated this week
- Metadata from Project Gutenberg☆41Updated 2 months ago
- A command-line tool for interacting with books in git☆111Updated 10 months ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆144Updated last year
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆95Updated 6 years ago
- WARC and ARC indexing and discovery tools.☆124Updated 3 months ago
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆116Updated 8 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆99Updated last year
- Python tools for processing data from the Catalog of Copyright Entries☆38Updated 5 years ago
- System for building, visualizing, and working with LDA topic models☆97Updated 2 weeks ago
- Scripts to auto-OCR PDFs, translate output using publicly-available or DIY NLP translation models, and generate epub/PDF☆43Updated last year
- An online annotation platform for teaching and learning in the humanities.☆108Updated this week
- Some tools to help analyze the twitter archive☆62Updated 2 weeks ago
- Collection of Python code to re-use across Python-based scrapers☆24Updated last month
- Making the public domain Loebs more easily downloadable. Data at https://github.com/ryanfb/loebolus-data☆96Updated this week
- Scripts for scraping metadata from Project Gutenberg books, via GITenberg.☆19Updated 6 years ago
- A set of utilities for processing MediaWiki XML dump data.☆55Updated 4 months ago
- Python library for reading and writing warc files☆241Updated 3 years ago
- An LL parser for extracting information from Wiki text, particularly Wiktionary.☆49Updated last year
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆161Updated 3 weeks ago