puntonim / gutenberg-bulk-downloader
Bulk downloader for free ebooks hosted at Project Gutenberg
☆19Updated 3 years ago
Alternatives and similar repositories for gutenberg-bulk-downloader
Users that are interested in gutenberg-bulk-downloader are comparing it to the libraries listed below
Sorting:
- Command-line corpus tools☆9Updated 8 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆64Updated 8 years ago
- A tool for analyzing the word histories of a text.☆34Updated 5 months ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- Extract Data from Wikipedia Tables☆34Updated 7 years ago
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Extract Data from Wikipedia Lists☆31Updated 7 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year
- spaCy-to-naf converter☆21Updated 11 months ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Updated 11 months ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated last year
- Scrapes some Finnish word definitions from English Wiktionary.☆8Updated last year
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- The CIS OCR PostCorrectionTool☆42Updated 2 years ago
- Easy language identification of 380 languages☆17Updated 5 years ago
- CONLL-U to Pandas DataFrame☆31Updated 7 years ago
- A Named-Entity Recogniser based on Grobid.☆52Updated this week
- An intelligent reading agent that understands text and translates it into Wikidata statements.☆115Updated 8 years ago
- A small tool that EXPLains spACY parse results. See what I did there?☆84Updated 3 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 4 years ago
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- a python package for cleaning Gutenberg books and dataset☆35Updated 2 weeks ago
- A web-based, token-level annotation tool for non-standard language data☆10Updated 4 years ago
- Simple CORPORA list crawler☆10Updated 8 years ago
- Multi Tier Annotation Search☆26Updated 4 years ago
- A tool to extract canonical references from text.☆20Updated 3 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- In-browser OCR of Ancient Greek and Latin☆26Updated 3 weeks ago