jsfenfen / whatwordwhere
Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
☆84Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for whatwordwhere
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- An expandable and scalable OCR pipeline☆86Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago
- TSM - Twitter Subgraph Manipulator☆83Updated 4 years ago
- A place to collect and share knowledge about liberating data from PDFs☆53Updated 2 years ago
- NICAR 2016 talk about PDFs!☆62Updated 8 years ago
- Structured Data from PDF image-based files☆87Updated 11 years ago
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆34Updated 9 years ago
- ☆24Updated 9 years ago
- An online annotation platform for teaching and learning in the humanities.☆106Updated 3 weeks ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 9 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆37Updated 8 months ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- A library for extracting tables from PDF files☆90Updated 11 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated last year
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆260Updated 8 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆36Updated 11 months ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 9 years ago
- Demonstration of how dedupe might be used as geocoder☆17Updated 2 years ago
- Supervised learning for novelty detection in text☆79Updated 8 years ago
- Sometimes you just need a lot of text. Plainstream is a small Python app that provides you with a plain text stream directly from Wikiped…☆24Updated last year
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- ☆13Updated 7 years ago
- rapid nlp prototyping☆72Updated 2 years ago
- A trend viewer written in Python/JavaScript☆21Updated last week