Python library to extract text from PDF, and default to OCR when text extraction fails.
☆62Oct 6, 2017Updated 8 years ago
Alternatives and similar repositories for doc_processing_toolkit
Users that are interested in doc_processing_toolkit are comparing it to the libraries listed below
Sorting:
- A basic spreadsheet to api engine☆43Aug 27, 2019Updated 6 years ago
- A complete agency API program.☆12Apr 27, 2017Updated 8 years ago
- Crawl a site, run pa11y on every HTML page, and get the results☆18Sep 27, 2016Updated 9 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 8 years ago
- We use Tock to track and report our time at 18F☆125Nov 6, 2025Updated 3 months ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆16Oct 7, 2019Updated 6 years ago
- [DEPRECATED] Run cron jobs in a Cloud Foundry app.☆13Sep 6, 2017Updated 8 years ago
- Sharing a viewer we built for WNYC.☆12May 10, 2011Updated 14 years ago
- list of English words with shorter synonyms☆24Jun 2, 2019Updated 6 years ago
- Term List Matching Plugin for ElasticSearch☆26Jan 20, 2014Updated 12 years ago
- Python interface for the Berkeley Parser using JPype☆12Dec 18, 2015Updated 10 years ago
- A python script that looks for special lines in a markdown file and uses those lines to convert, clean up, and insert content from URLs i…☆16Dec 9, 2012Updated 13 years ago
- Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.☆53Nov 8, 2010Updated 15 years ago
- Ruby access to the SAM.gov API☆12Mar 25, 2017Updated 8 years ago
- A scaffold/generator to standardize 18F project setup☆26Sep 9, 2019Updated 6 years ago
- ☆11Sep 29, 2015Updated 10 years ago
- An introduction to Python - https://www.digitalgov.gov/event/online-intro-to-python/☆10Aug 2, 2017Updated 8 years ago
- A simple script to look for and process all the federal data.json data inventories.☆46Mar 10, 2015Updated 10 years ago
- a Jekyll Plugin that generates a JSON file with data for all the Pages in your Site☆44Aug 28, 2016Updated 9 years ago
- A project focused on tools and best practices to supported federated data collection efforts☆29May 5, 2020Updated 5 years ago
- A lightweight pipeline, locally or in Lambda, for scanning things like HTTPS, third party service use, and web accessibility.☆388Aug 6, 2021Updated 4 years ago
- A simple example using R and D3.js for show the examples of SNA Course in Coursera☆32Jun 23, 2016Updated 9 years ago
- Prototype of making fisma 800-53 controls interactive☆27Nov 8, 2020Updated 5 years ago
- An entirely unofficial look at the technology stacks of the 2016 Presidential Campaigns☆14Mar 2, 2016Updated 9 years ago
- Embeddable forms to recruit research participants. Sends results to a Google Sheet, deployed via Google Tag Manager.☆14Jun 25, 2018Updated 7 years ago
- phpAudit is a simple shell script that scans PHP files for possible security risks.☆26Apr 7, 2013Updated 12 years ago
- ReVAL: Reusable Validation Library - A Django App for validating data via API and web interface☆32Aug 3, 2021Updated 4 years ago
- Web scraping engines with Python and Scrapy☆33Sep 24, 2020Updated 5 years ago
- A collection of small scripts to do various things☆32Jun 29, 2015Updated 10 years ago
- Investigative tool for extracting relevant areas from many documents☆14Nov 17, 2015Updated 10 years ago
- ☆17Apr 5, 2016Updated 9 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Sep 14, 2016Updated 9 years ago
- A Jekyll template for project documentation☆106Dec 27, 2020Updated 5 years ago
- How the federal .gov domain space is doing at best practices and policies.☆95Jun 9, 2020Updated 5 years ago
- Allow anyone with a modern browser to stream a 1GB, 10GB, 100GB, or 1TB file over the Internet and into a happy home.☆15Jun 9, 2017Updated 8 years ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 9 years ago
- Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)☆37Sep 24, 2015Updated 10 years ago
- A place to collect and share knowledge about liberating data from PDFs☆55Jan 30, 2022Updated 4 years ago
- A Slack bot to welcome new 18F hires with the authority and compassion of Mrs. Landingham☆189Sep 9, 2019Updated 6 years ago