dannguyen / abbyy-finereader-ocr-senate
Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms
☆129Updated 8 years ago
Related projects: ⓘ
- NICAR 2016 talk about PDFs!☆62Updated 8 years ago
- A collection of tools for mining government data☆139Updated 8 years ago
- Extract tables from PDF files☆354Updated 8 years ago
- search document dumps: ingest and explore in one extensible framework☆124Updated 4 years ago
- Repository for PyCon 2016 workshop Natural Language Processing in 10 Lines of Code☆241Updated 7 years ago
- Mechanical Turk on your own machine.☆206Updated 2 years ago
- Extract tabular data and semantically discover it with ease! (OS)☆21Updated 8 years ago
- online natural language processing with word vectors☆310Updated 2 months ago
- A library for extracting tables from PDF files☆90Updated 10 years ago
- Code + Jupyter notebook for analyzing and visualizing Reddit Data quickly and easily☆112Updated 8 years ago
- Code to transform Hillary's emails from raw PDF documents to a SQLite database☆163Updated 8 years ago
- using XPDF, pdftojson extracts text from PDF files as JSON, including word bounding boxes.☆140Updated 10 months ago
- ☆91Updated 8 years ago
- Parser and standardizer for politician, individual and organization names.☆128Updated 7 years ago
- A proof of concept using IBM's Speech-to-Text API to do quick-and-dirty transcriptions☆311Updated 8 years ago
- A node.js library for extracting data from scanned forms.☆117Updated last year
- Hacker News plus topic tags. TechCrunch Disrupt NY Hackathon 2017☆123Updated 6 years ago
- A framework for visualizing parent-child relationships with d3js☆116Updated 6 years ago
- ☆348Updated this week
- ☆211Updated this week
- Tool for visual exploration of complex data.☆190Updated 5 years ago
- We introduce TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool. TACIT's plugin architecture has three main components…☆106Updated 5 years ago
- Extract postal addresses from the DOM☆65Updated 12 years ago
- Tools to download and process name data from various sources.☆88Updated 10 years ago
- Supervised learning for novelty detection in text☆79Updated 7 years ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 6 years ago
- Python library to extract text from PDF, and default to OCR when text extraction fails.☆60Updated 6 years ago
- Model Training tool for MITIE☆79Updated 9 years ago
- Language Lego☆142Updated 4 years ago
- Extract tables from PDF pages.☆274Updated 4 years ago