qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
β18Updated 2 years ago
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- π Prototype Orange widgets β only for the brave.β12Updated this week
- Convert text from PDF to XML.β45Updated 6 years ago
- Use visual programming to build data tables based on text data within the Orange data mining software environmentβ29Updated 2 months ago
- Python wrapper for xpdfβ19Updated 5 years ago
- (BROKEN, help wanted)β15Updated 9 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.β79Updated 2 years ago
- Convert a corpus of PDF to clean text files on a distributed architectureβ38Updated last year
- How to handle emoji in Python + a quick Python script to count emoji in Tweets as an example. (python 2.7)β13Updated 9 years ago
- Ergonomic line-by-line transcription of scanned text.β53Updated 4 years ago
- Scraping Tweet data for Russian Troll Twitter accounts into Neo4jβ57Updated 7 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at hβ¦β189Updated 4 years ago
- Copyleaks finds plagiarism online using copyright infringement detection technology. Find those who have used your content with Copyleaksβ¦β103Updated this week
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visualiβ¦β85Updated 5 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations aβ¦β99Updated 2 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatioβ¦β68Updated last year
- Orange Data Mining Homepageβ17Updated 5 years ago
- β35Updated last year
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better undersβ¦β46Updated 3 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscoveryβ56Updated last year
- framework for scraping legislative/government dataβ88Updated 11 months ago
- Monitor datasets, gets alerts when something happensβ210Updated 6 years ago
- Run Overview on your own systemβ126Updated 4 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.β14Updated 6 months ago
- Easily display Zotero items on a webpageβ32Updated 2 years ago
- π Data fusion add-on for Orange3β16Updated 5 years ago
- Installer for Thymeflow, a personal knowledge management system.β34Updated 7 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)β18Updated 10 years ago
- Source code for the Twitter Hybrid Sentiment Classifier used in Semeval 2014 competition. (Sentiment Analysis system)β13Updated 11 years ago
- A place to collect and share knowledge about liberating data from PDFsβ54Updated 3 years ago
- A scraper focused on organizational Github accounts and their members.β42Updated 3 years ago