A command-line program to download text corpora.
☆34Aug 12, 2017Updated 8 years ago
Alternatives and similar repositories for corpus-downloader
Users that are interested in corpus-downloader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Scripts for scraping metadata from Project Gutenberg books, via GITenberg.☆19Sep 11, 2018Updated 7 years ago
- The Art of Literary Text Analysis☆169Apr 4, 2019Updated 7 years ago
- ☆19Jul 9, 2018Updated 7 years ago
- Plots various graphs for a series of plaintext files in a directory☆19Jun 6, 2016Updated 9 years ago
- Work-in-progress list of funding opportunities for the digital humanities☆14Jan 15, 2016Updated 10 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A simple tool for splitting up an ebook into its chapters. Works well with Project Gutenberg texts. May also be used to clean up books fo…☆116Sep 8, 2018Updated 7 years ago
- Training a classifier to reddit's TIL to find new things on Wikipedia☆35Sep 25, 2015Updated 10 years ago
- spaCy-to-naf converter☆21Jun 10, 2025Updated 10 months ago
- A structured list of text corpora, created for use with a corpus downloader.☆13Aug 27, 2016Updated 9 years ago
- A digital humanities operating system that runs on a USB disk.☆32Jul 5, 2017Updated 8 years ago
- Client to browse and edit PeriodO data☆17Apr 10, 2026Updated last week
- XSLT for converting TEI MsDescription to IIIF manifests☆13Oct 18, 2016Updated 9 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Mar 6, 2018Updated 8 years ago
- ☆11Nov 14, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Materials related to the Project Laboratory session of #GCDRI☆16Nov 29, 2017Updated 8 years ago
- bin files☆13Jan 30, 2025Updated last year
- Text-Induced Corpus Clean-up☆20Jun 20, 2023Updated 2 years ago
- (Mental) maps of texts with kernel density estimation and force-directed networks.☆108Jun 22, 2015Updated 10 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 8 years ago
- Plugin to use rich text in Annotator☆30Oct 7, 2014Updated 11 years ago
- InfiniteUlysses.com repo as it was when I finished the related Ph.D. project. See instead github.com/amandavisconti/infinite-ulysses-publ…