qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
☆19Updated 2 years ago
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Use visual programming to build data tables based on text data within the Orange data mining software environment☆30Updated last month
- Python wrapper for xpdf☆19Updated 6 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Convert text from PDF to XML.☆45Updated 7 years ago
- Orange Data Mining Homepage☆17Updated 6 years ago
- 🍊 Prototype Orange widgets — only for the brave.☆12Updated 3 months ago
- A place to collect and share knowledge about liberating data from PDFs☆55Updated 3 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 7 years ago
- Run Overview on your own system☆129Updated 4 years ago
- A toolkit for clustering web pages based on various similarity measures.☆34Updated 4 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆192Updated 4 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 6 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 4 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆58Updated last year
- Easily display Zotero items on a webpage☆33Updated 2 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆69Updated 2 years ago
- Now included in rigour☆152Updated 2 weeks ago
- Date parsing and normalization utilities for Python.☆22Updated 2 years ago
- Palladio Application☆43Updated 4 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 3 years ago
- Trying to generate name synonyms from wikidata☆34Updated 5 years ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- FreeQDA☆29Updated 5 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆275Updated 3 years ago
- Ideas for (tech) stuff to research, build or work on.☆50Updated 11 months ago
- (BROKEN, help wanted)☆15Updated 9 years ago