qedsoftware / multipage-ocr
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for multipage-ocr
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18Updated 3 weeks ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 8 years ago
- RESTful API around the PETRARCH coding software☆10Updated 3 years ago
- 🧮 Python package to construct word embeddings for small data using PMI and SVD☆16Updated 4 years ago
- NYT Risk Semantics Project☆12Updated 8 years ago
- Stylometric framework in Python☆13Updated 9 years ago
- Getting, analysing and displaying lists of papers☆13Updated last month
- Convert text from PDF to XML.☆45Updated 6 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆37Updated 8 months ago
- Ergonomic line-by-line transcription of scanned text.☆48Updated 3 years ago
- Sanskrit Corpus☆15Updated 8 years ago
- Language-agnostic political event coding using universal dependencies☆18Updated 5 years ago
- Easily display Zotero items on a webpage☆32Updated last year
- ☆10Updated 9 years ago
- A digital humanities operating system that runs on a USB disk.☆31Updated 7 years ago
- Code for the paper "Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift…☆16Updated 7 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- Soundex Phonetic Code Algorithm Demo for Indian Languages. Supports all indian languages and English. Provides intra-indic string compari…☆55Updated 5 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆53Updated 4 months ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- Turning news into events since 2014.☆50Updated 7 years ago
- Atom/Electron Application for calling PanDoc Converter with Shell Commands on Linux Windows Mac☆16Updated 3 years ago
- Topic Modeling Workflow in Python☆16Updated last year
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 9 years ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 6 years ago
- ☆12Updated 5 years ago
- Visual analytics application for qualitative text analysis☆24Updated last year
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 9 months ago