18F / doc_processing_toolkitLinks
Python library to extract text from PDF, and default to OCR when text extraction fails.
☆62Updated 8 years ago
Alternatives and similar repositories for doc_processing_toolkit
Users that are interested in doc_processing_toolkit are comparing it to the libraries listed below
Sorting:
- Please check out https://github.com/18F/foia-hub/issues to track our work. This repo is for project wide discussion, blogging, and scratc…☆51Updated 7 years ago
- A basic spreadsheet to api engine☆43Updated 6 years ago
- Collecting reports from Inspectors General across the US federal government.☆111Updated 4 years ago
- Scrapers for US municipal governments.☆104Updated last month
- Parser and standardizer for politician, individual and organization names.☆129Updated 8 years ago
- Turns legal citations in the DOM into links☆20Updated 8 years ago
- Legal codes, for humans.☆260Updated 4 years ago
- A complete agency API program.☆12Updated 8 years ago
- Importer for US Spending data☆34Updated 11 years ago
- (DEPRECATED) Parser for U.S. federal regulations and other regulatory information☆55Updated 7 years ago
- “Let Me Get That Data For You” catalogs the machine-readable data on a given domain name. [RETIRED]☆102Updated 10 years ago
- legacy backend for Open States☆87Updated 5 years ago
- PANDA: A Newsroom Data Appliance☆208Updated 3 years ago
- Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date☆122Updated 3 years ago
- framework for scraping legislative/government data☆89Updated last month
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆192Updated 4 years ago
- A Python web application for converting PDF forms into PDF-filling APIs☆48Updated 5 years ago
- Python workers that collect tweets from the twitter streaming api and track deletions☆128Updated 3 years ago
- Parser for U.S. federal regulations and other regulatory information☆40Updated 2 years ago
- The basic code behind the @big_cases Twitter bot☆105Updated 6 years ago
- A deprecated Python wrapper for the DocumentCloud API☆62Updated 5 years ago
- A repository of journalist's lookup tables.☆107Updated 8 years ago
- Friendly Slack bot for looking up cases☆21Updated 8 years ago
- ReVAL: Reusable Validation Library - A Django App for validating data via API and web interface☆32Updated 4 years ago
- Easily crowdsource the analysis of your documents☆102Updated 8 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 7 years ago
- A Flask-based static site authoring tool.☆164Updated 3 years ago
- A step-by-step guide to publishing a simple news application.☆75Updated 7 years ago
- ScraperWiki Python library for scraping and saving data; in maintenance mode☆158Updated this week
- Federal Spending Transparency☆56Updated last year