18F/doc_processing_toolkit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/18F/doc_processing_toolkit)

18F / doc_processing_toolkit

Python library to extract text from PDF, and default to OCR when text extraction fails.

☆62

Alternatives and similar repositories for doc_processing_toolkit

Users that are interested in doc_processing_toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

18F / autoapi
View on GitHub
A basic spreadsheet to api engine
☆43Aug 27, 2019Updated 6 years ago
18F / pa11y-crawl
View on GitHub
Crawl a site, run pa11y on every HTML page, and get the results
☆18Sep 27, 2016Updated 9 years ago
18F / tock
View on GitHub
We use Tock to track and report our time at 18F
☆123Nov 6, 2025Updated 8 months ago
18F / api-program
View on GitHub
A complete agency API program.
☆12Apr 27, 2017Updated 9 years ago
gojiplus / image-to-text
View on GitHub
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆16Oct 7, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cloud-gov / cg-cron
View on GitHub
[DEPRECATED] Run cron jobs in a Cloud Foundry app.
☆13Sep 6, 2017Updated 8 years ago
18F / open-data-maker
View on GitHub
make it easy to turn a lot of potentially large csv files into easily accessible open data
☆197Nov 2, 2016Updated 9 years ago
18F / domain-scan
View on GitHub
A lightweight pipeline, locally or in Lambda, for scanning things like HTTPS, third party service use, and web accessibility.
☆389Aug 6, 2021Updated 4 years ago
18F / 18f-scaffolding
View on GitHub
A scaffold/generator to standardize 18F project setup
☆26Sep 9, 2019Updated 6 years ago
18F / linkify-citations
View on GitHub
Turns legal citations in the DOM into links
☆20Mar 15, 2017Updated 9 years ago
jkeefe / Custom-Viewer-for-DocumentCloud
View on GitHub
Sharing a viewer we built for WNYC.
☆12May 10, 2011Updated 15 years ago
18F / jekyll_pages_api
View on GitHub
a Jekyll Plugin that generates a JSON file with data for all the Pages in your Site
☆44Aug 28, 2016Updated 9 years ago
18F / gapps-download
View on GitHub
CLI downloading for google documents
☆14Oct 27, 2015Updated 10 years ago
GSA / recruiter
View on GitHub
Embeddable forms to recruit research participants. Sends results to a Google Sheet, deployed via Google Tag Manager.
☆14Jun 25, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
18F / chandika
View on GitHub
Cloud Application Registry
☆16Nov 7, 2017Updated 8 years ago
codeforamerica / clean
View on GitHub
Apply for CalFresh in SF
☆20Feb 10, 2016Updated 10 years ago
GovReady / 800-53-server
View on GitHub
Prototype of making fisma 800-53 controls interactive
☆27Nov 8, 2020Updated 5 years ago
konklone / bit.voyage
View on GitHub
Allow anyone with a modern browser to stream a 1GB, 10GB, 100GB, or 1TB file over the Internet and into a happy home.
☆32Oct 7, 2018Updated 7 years ago
statedecoded / law-identifier
View on GitHub
A collection of regular expressions to identify references to state laws.
☆19Sep 28, 2015Updated 10 years ago
hosom / bro-scripts
View on GitHub
Bro stuff.
☆12May 24, 2016Updated 10 years ago
opencontrol / RedHat
View on GitHub
OpenControl content for Red Hat technologies
☆16Jan 20, 2020Updated 6 years ago
18F / samwise
View on GitHub
Ruby access to the SAM.gov API
☆12Mar 25, 2017Updated 9 years ago
jheise / threatcmd
View on GitHub
Cli interface to threatcrowd.org
☆21Jul 6, 2017Updated 9 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
18F / data-federation-project
View on GitHub
A project focused on tools and best practices to supported federated data collection efforts
☆29May 5, 2020Updated 6 years ago
18F / dolores-landingham-slack-bot
View on GitHub
A Slack bot to welcome new 18F hires with the authority and compassion of Mrs. Landingham
☆188Sep 9, 2019Updated 6 years ago
ChicagoHarris / blobs
View on GitHub
☆17Apr 5, 2016Updated 10 years ago
bbieniek / spacy-api-docker
View on GitHub
spaCy REST API, wrapped in a Docker container.
☆16Apr 2, 2021Updated 5 years ago
deadlyforcedb / data-recipes
View on GitHub
A small repo of notes and scripts for collecting data on U.S. deadly force police incidents
☆10Aug 9, 2015Updated 10 years ago
0xd34db33f / maltego-transforms
View on GitHub
Public Maltego Transforms
☆24May 24, 2017Updated 9 years ago
18F / ReVAL
View on GitHub
ReVAL: Reusable Validation Library - A Django App for validating data via API and web interface
☆32Aug 3, 2021Updated 4 years ago
18F / pulse
View on GitHub
How the federal .gov domain space is doing at best practices and policies.
☆95Jun 9, 2020Updated 6 years ago
cfpb / DOCter
View on GitHub
A Jekyll template for project documentation
☆105Dec 27, 2020Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kemitchell / wordy-words
View on GitHub
list of English words with shorter synonyms
☆24Apr 6, 2026Updated 3 months ago
18F / an_introduction_to_python
View on GitHub
An introduction to Python - https://www.digitalgov.gov/event/online-intro-to-python/
☆10Aug 2, 2017Updated 8 years ago
paultopia / lawpy
View on GitHub
pythonic interface to the courtlistener api
☆20Oct 30, 2018Updated 7 years ago
18F / voyage
View on GitHub
Allow anyone with a modern browser to stream a 1GB, 10GB, 100GB, or 1TB file over the Internet and into a happy home.
☆15Jun 9, 2017Updated 9 years ago
anseljh / casebot
View on GitHub
Friendly Slack bot for looking up cases
☆21Dec 19, 2017Updated 8 years ago
18F / raktabija
View on GitHub
Bootstrap AWS account with Terraform and Go.CD
☆29Oct 24, 2017Updated 8 years ago
mgwalker / emoji-and-facts
View on GitHub
☆22Feb 18, 2026Updated 5 months ago