dannguyen / abbyy-finereader-ocr-senate
Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms
☆129Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for abbyy-finereader-ocr-senate
- Extract tables from PDF files☆354Updated 8 years ago
- Code to transform Hillary's emails from raw PDF documents to a SQLite database☆162Updated 8 years ago
- A collection of tools for mining government data☆139Updated 8 years ago
- A library for extracting tables from PDF files☆90Updated 11 years ago
- ☆89Updated 9 years ago
- TensorFlow for AWS☆115Updated 9 years ago
- NICAR 2016 talk about PDFs!☆62Updated 8 years ago
- Repository for PyCon 2016 workshop Natural Language Processing in 10 Lines of Code☆241Updated 7 years ago
- online natural language processing with word vectors☆310Updated 4 months ago
- Parser and standardizer for politician, individual and organization names.☆128Updated 7 years ago
- Hacker News plus topic tags. TechCrunch Disrupt NY Hackathon 2017☆123Updated 6 years ago
- Loan-level analysis of Fannie Mae and Freddie Mac data☆216Updated 4 years ago
- make it easy to turn a lot of potentially large csv files into easily accessible open data☆199Updated 8 years ago
- Extract tabular data and semantically discover it with ease! (OS)☆21Updated 8 years ago
- Analyze the structure and dynamics of an open source project's developer community, using graph algorithms, etc.☆57Updated 3 years ago
- Tool for visual exploration of complex data.☆191Updated 6 years ago
- Create simple APIs from CSV files☆193Updated 4 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆94Updated 6 years ago
- A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions☆311Updated 8 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Updated 7 years ago
- We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components…☆107Updated 5 years ago
- A (comprehensive) collection of open source tools used by the data community.☆51Updated 8 years ago
- Simple Python scripts to download all Hacker News submissions and comments and store them in a PostgreSQL database.☆120Updated 7 years ago
- A framework for visualizing parent-child relationships with d3js☆116Updated 6 years ago
- Open Research is a framework that contains documents that aid in the practice of product and customer research☆114Updated 8 years ago
- “Let Me Get That Data For You” catalogs the machine-readable data on a given domain name. [RETIRED]☆102Updated 9 years ago