qedsoftware / multipage-ocr
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Alternatives and similar repositories for multipage-ocr:
Users that are interested in multipage-ocr are comparing it to the libraries listed below
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- ☆21Updated last month
- (BROKEN, help wanted)☆15Updated 9 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 3 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Take streaming tweets, extract hashtags & usernames, create graph, export graphml for Gephi visualisation☆38Updated 11 years ago
- This page is a companion for the paper titled Towards Automatic Structuring and Semantic Indexing of Legal Documents☆29Updated 6 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Installer for Thymeflow, a personal knowledge management system.☆33Updated 6 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- RESTful API around the PETRARCH coding software☆10Updated 3 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- Stylometric framework in Python☆17Updated 9 years ago
- A platform for tools that do stuff with data☆56Updated 6 years ago
- This is a REST Server endpoint built using Flask and Python.☆23Updated 2 years ago
- modification of bibliotools 2.2 from Sébastian Grauwin☆11Updated 5 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago
- Use visual programming to build data tables based on text data within the Orange data mining software environment☆28Updated this week
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 8 years ago
- A space for code and projects around analysing news content☆23Updated 7 years ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- extract difference between two html pages☆32Updated 6 years ago