puntonim / gutenberg-bulk-downloaderLinks
Bulk downloader for free ebooks hosted at Project Gutenberg
☆19Updated 3 years ago
Alternatives and similar repositories for gutenberg-bulk-downloader
Users that are interested in gutenberg-bulk-downloader are comparing it to the libraries listed below
Sorting:
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- eXternally configurable REference and Non Named Entity Recognizer☆17Updated last year
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Hierarchical phrase-based machine translation system☆32Updated 10 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year
- TiMBL implements several memory-based learning algorithms.☆52Updated last week
- A spell-checker extending Peter Norvig's with multi-typo correction, hamming distance weighting, and more.☆98Updated 4 years ago
- Machine translation for the real world☆23Updated 5 years ago
- Scrapes some Finnish word definitions from English Wiktionary.☆8Updated last year
- Basic dataset for the linguistic data collection.☆15Updated 8 years ago
- The CIS OCR PostCorrectionTool☆42Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆64Updated last year
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18Updated last month
- Wikipedia API wrapper for humans and elk. (en.wikipedia.org/w/api.php, get it?)☆36Updated 10 years ago
- A trend viewer written in Python/JavaScript☆21Updated 7 months ago
- A web-based, token-level annotation tool for non-standard language data☆10Updated 4 years ago
- A tool for analyzing the word histories of a text.☆34Updated 6 months ago
- Polyglot is a language identifier for detecting text documents containing text written in more than one language, and for identifying the…☆33Updated 8 years ago
- A Python toolkit converting pronunciation in enwiktionary xml dump to cmudict format☆33Updated 5 years ago
- Generic Environment for Context-Aware Correction of Orthography☆22Updated 2 years ago
- Building and Using A Seed Corpus for the Human Language Project☆11Updated 7 years ago
- A powerful, tagset-independent and theory-neutral meta model and API for storing, manipulating, and representing nearly all types of ling…☆15Updated 2 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated 3 months ago
- ☆26Updated 6 years ago
- a python package for cleaning Gutenberg books and dataset☆34Updated last month
- Convert a corpus of PDF to clean text files on a distributed architecture☆39Updated last year
- OCRopus model for Gothic print (Fraktur)☆18Updated 5 years ago
- A tool for automatic spelling normalization☆20Updated 4 years ago
- IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).☆32Updated 6 years ago