jsfenfen/whatwordwhere

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jsfenfen/whatwordwhere)

jsfenfen / whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.

☆83

Alternatives and similar repositories for whatwordwhere

Users that are interested in whatwordwhere are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alexbyrnes / FCC-Political-Ads
View on GitHub
Archive of political ad data from the Federal Communications Commission
☆21Oct 25, 2017Updated 8 years ago
datanews / data-inventories
View on GitHub
A simple script to look for and process all the federal data.json data inventories.
☆46Mar 10, 2015Updated 11 years ago
opensecrets / OCRToolkit
View on GitHub
Tools for working with Optical Character Recognition output
☆16Mar 7, 2014Updated 12 years ago
newsday / newstools-checkup
View on GitHub
An open-source Django app to survey politicians
☆18Apr 23, 2014Updated 12 years ago
alexbyrnes / Datapiece
View on GitHub
Investigative tool for extracting relevant areas from many documents
☆14Nov 17, 2015Updated 10 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jsfenfen / parsing-prickly-pdfs
View on GitHub
NICAR 2016 talk about PDFs!
☆63Mar 12, 2016Updated 10 years ago
cschnaars / FEC-Scraper
View on GitHub
Scripts to scrape the FEC website and parse campaign filings
☆46Mar 13, 2012Updated 14 years ago
alexbyrnes / FCC-Political-Ads_The-Code
View on GitHub
Code for extracting data from a large number of PDFs, particularly FCC political ad documents
☆15Oct 26, 2017Updated 8 years ago
simonw / fivethirtyeight-datasette
View on GitHub
Code to package FiveThirtyEight data using Datasette
☆16Mar 5, 2026Updated 4 months ago
cjdd3b / fec-standardizer
View on GitHub
An experiment to standardize individual donor names in campaign finance data using simple graph theory and machine learning.
☆64Jan 25, 2013Updated 13 years ago
bycoffe / fec-guide
View on GitHub
☆25Mar 18, 2013Updated 13 years ago
NYPL-publicdomain / greenbook-map
View on GitHub
Experimental interfaces that explore the Green Books collection at NYPL's Schomburg Center for Research in Black Culture
☆25Nov 7, 2024Updated last year
pioneerpress / code
View on GitHub
Code & supporting data behind Pioneer Press stories and interactives.
☆14Jan 16, 2018Updated 8 years ago
sul-dlss-deprecated / iiifManifestLayouts
View on GitHub
☆10Nov 2, 2016Updated 9 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mhkeller / turntable
View on GitHub
Node.js scripts for pulling data from Google Docs and uploading them to S3 with data scrubbing and moderation.
☆19Apr 20, 2014Updated 12 years ago
datanews / tables
View on GitHub
Tables is a simple command-line tool and powerful library for importing data like a CSV or JSON file into relational tables
☆88Dec 10, 2022Updated 3 years ago
ajam / banquo
View on GitHub
A node.js screenshot service.
☆17Apr 6, 2016Updated 10 years ago
datanews / minimaps
View on GitHub
Minimap generator used for the 2014 NY/NJ general election. Makes tiny state .pngs highlighting individual districts.
☆35Nov 6, 2014Updated 11 years ago
anthonydb / pneumatic
View on GitHub
pneumatic is a bulk-upload library for DocumentCloud.
☆22Sep 6, 2020Updated 5 years ago
mtigas / dump1090-stream-parser
View on GitHub
Fork of dump1090-stream-parser. Takes SBS output from `dump1090` and puts it into a database.
☆13Apr 16, 2019Updated 7 years ago
mhkeller / tk-toolkit
View on GitHub
The TK Toolkit. Utilities for working with data in Node.js.
☆15Nov 16, 2015Updated 10 years ago
dhess / lobbyists
View on GitHub
Parsers and utilities for the Senate LD-1/LD-2 database.
☆16Jan 29, 2016Updated 10 years ago
rdmurphy / node-copytext
View on GitHub
A module for accessing a XLSX spreadsheet as a JavaScript object.
☆16Aug 25, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
majohansson / maria-puerto-rico
View on GitHub
Data associated with Hurricane Maria in Puerto Rico
☆18Sep 17, 2018Updated 7 years ago
konklone / oversight.garden
View on GitHub
Bringing together the oversight community's work.
☆26May 3, 2020Updated 6 years ago
atmccann / lean_in
View on GitHub
☆16Apr 5, 2014Updated 12 years ago
newsdev / nyt-pyfec
View on GitHub
A Python library for downloading, parsing and cleaning Federal Election Commission filings.
☆28Jan 30, 2024Updated 2 years ago
kevinschaul / us-abbreviations
View on GitHub
Utility for converting between different U.S. state abbreviations.
☆16Oct 27, 2024Updated last year
gebelo / gijc
View on GitHub
Handouts/Tipsheets for the 2015 Global Investigative Journalism Conference
☆10Oct 9, 2015Updated 10 years ago
seanherron / sheeet
View on GitHub
Sheeet is a simple utility to take a number of Excel files and convert them to CSV.
☆20Feb 6, 2014Updated 12 years ago
newsdev / foialawya
View on GitHub
an app for keeping track of your FOIAs and getting alerts when they're (over) due
☆54Apr 18, 2023Updated 3 years ago
flatsheet / flatsheet-prototype
View on GitHub
The oooold prototype of flatsheet. Go here instead: http://github.com/flatsheet/flatsheet
☆41May 13, 2014Updated 12 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
adelevie / downlaw
View on GitHub
Write markdown with legal citations on the left, get rendered markdown on the right. Oh, and the legal citations become links.
☆20May 4, 2014Updated 12 years ago
gwk / muck
View on GitHub
A build tool for data projects.
☆49Dec 27, 2024Updated last year
kevinschaul / depict
View on GitHub
Depict aims to easily render fallback images of web elements for platform that do not run code.
☆55Dec 19, 2025Updated 7 months ago
PalmBeachPost / postgeo
View on GitHub
Geocode CSVs and jitter overlapping points
☆22Jan 3, 2017Updated 9 years ago
jsfenfen / pdf17
View on GitHub
nicar 17: advanced pdf manipulation
☆18Mar 4, 2017Updated 9 years ago
tilgovi / haystax
View on GitHub
Haystax provides a simple, easy way for non-technical people to extract tabular data from Web pages. Based on Mozilla's Hackaraurus tool.
☆18Jun 24, 2012Updated 14 years ago
jze / ocropus-model_fraktur
View on GitHub
OCRopus model for Gothic print (Fraktur)
☆19Feb 16, 2020Updated 6 years ago