qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
β19Updated 2 years ago
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Use visual programming to build data tables based on text data within the Orange data mining software environmentβ30Updated 3 months ago
- π Prototype Orange widgets β only for the brave.β12Updated 5 months ago
- Convert a corpus of PDF to clean text files on a distributed architectureβ38Updated last year
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatioβ¦β69Updated 2 years ago
- (BROKEN, help wanted)β15Updated 9 years ago
- Python wrapper for xpdfβ19Updated 6 years ago
- Convert text from PDF to XML.β45Updated 7 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.β14Updated 11 months ago
- Easily display Zotero items on a webpageβ32Updated 2 years ago
- Installer for Thymeflow, a personal knowledge management system.β36Updated 7 years ago
- Date parsing and normalization utilities for Python.β22Updated 2 years ago
- π Text Mining add-on for Orange3β131Updated 2 months ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visualiβ¦β89Updated 6 years ago
- Python bindings for Apache Tikaβ24Updated 5 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)β18Updated 11 years ago
- PST extraction and analytic pipelineβ37Updated 7 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at hβ¦β193Updated 4 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better undersβ¦β47Updated 4 years ago
- Soundex Phonetic Code Algorithm Demo for Indian Languages. Supports all indian languages and English. Provides intra-indic string compariβ¦β59Updated 6 years ago
- A toolkit for clustering web pages based on various similarity measures.β34Updated 4 years ago
- Next generation OCR engine based on LSTMs.β52Updated 7 years ago
- Tools for tracking stories on news homepagesβ48Updated 6 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titlesβ23Updated 6 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations aβ¦β99Updated 3 years ago
- Quickly analyze and explore email with advanced analytics and visualization.β55Updated 4 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscoveryβ59Updated last year
- β35Updated 2 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trendsβ58Updated 2 years ago
- A simple viewer and inspection tool for text boxes in PDF documentsβ96Updated 3 years ago
- Resources, notebooks, assets for ML for Everyone Twitch streamβ14Updated 5 years ago