dannguyen / abbyy-finereader-ocr-senate
Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms
☆131Updated 9 years ago
Alternatives and similar repositories for abbyy-finereader-ocr-senate:
Users that are interested in abbyy-finereader-ocr-senate are comparing it to the libraries listed below
- A library for extracting tables from PDF files☆90Updated 11 years ago
- A collection of tools for mining government data☆140Updated 8 years ago
- Extract tables from PDF files☆356Updated 8 years ago
- Parser and standardizer for politician, individual and organization names.☆129Updated 7 years ago
- NICAR 2016 talk about PDFs!☆62Updated 9 years ago
- Repository for PyCon 2016 workshop Natural Language Processing in 10 Lines of Code☆239Updated 7 years ago
- Code to transform Hillary's emails from raw PDF documents to a SQLite database☆161Updated 9 years ago
- online natural language processing with word vectors☆309Updated 9 months ago
- Mechanical Turk on your own machine.☆205Updated 4 months ago
- A node.js library for extracting data from scanned forms.☆117Updated 2 years ago
- Code + Jupyter notebook for analyzing and visualizing Reddit Data quickly and easily☆112Updated 9 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆95Updated 6 years ago
- Extract tabular data and semantically discover it with ease! (OS)☆21Updated 9 years ago
- TensorFlow for AWS☆116Updated 9 years ago
- Keshif - Data Made Explorable (Prototype)☆457Updated 7 years ago
- Extract postal addresses from the DOM☆66Updated 12 years ago
- Hacker News plus topic tags. TechCrunch Disrupt NY Hackathon 2017☆123Updated 6 years ago
- ☆89Updated 9 years ago
- Language Lego☆141Updated 5 years ago
- A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions☆312Updated 8 years ago
- A Python web application for converting PDF forms into PDF-filling APIs☆46Updated 4 years ago
- Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.☆84Updated 9 years ago
- Python library to extract text from PDF, and default to OCR when text extraction fails.☆62Updated 7 years ago
- Find the essence☆109Updated 9 years ago
- A framework for visualizing parent-child relationships with d3js☆116Updated 7 years ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆294Updated 2 years ago
- A python server harnessing the calculational ability of LibreOffice Calc (thanks to 'pyoo'). It provides 'instant' access to the cell ran…☆138Updated last year
- Uber web interface crawler / scraper - Convert the trips table into a CSV file☆41Updated 6 years ago
- Extract tables from PDF pages.☆287Updated 4 years ago
- Supervised learning for novelty detection in text☆78Updated 8 years ago