A fast and friendly PDF scraping library.
☆783Oct 17, 2023Updated 2 years ago
Alternatives and similar repositories for pdfquery
Users that are interested in pdfquery are comparing it to the libraries listed below
Sorting:
- The simplest way to extract text from PDFs in Python☆428Jul 7, 2022Updated 3 years ago
- Investigative tool for extracting relevant areas from many documents☆14Nov 17, 2015Updated 10 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,302Dec 7, 2022Updated 3 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,911Apr 29, 2024Updated last year
- Community maintained fork of pdfminer - we fathom PDF☆6,909Updated this week
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,315Dec 5, 2024Updated last year
- Tables is a simple command-line tool and powerful library for importing data like a CSV or JSON file into relational tables☆88Dec 10, 2022Updated 3 years ago
- Tabula is a tool for liberating data tables trapped inside PDF files☆7,334Mar 14, 2025Updated 11 months ago
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆9,774Jan 28, 2026Updated last month
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,254Jun 24, 2022Updated 3 years ago
- Presentation for the NYU Data Lab December 2015☆14Dec 2, 2015Updated 10 years ago
- Archive of political ad data from the Federal Communications Commission☆20Oct 25, 2017Updated 8 years ago
- ☆23Mar 7, 2015Updated 10 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,839Updated this week
- A how-to do a mass collection of FEC data using the command-line and regular expressions☆29Mar 18, 2016Updated 9 years ago
- extract text from any document. no muss. no fuss.☆4,458Feb 4, 2026Updated 3 weeks ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Oct 26, 2017Updated 8 years ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆216Dec 3, 2019Updated 6 years ago
- WSGI Profiling Middleware - capture cProfiles with request data.☆14Oct 28, 2014Updated 11 years ago
- For watching a set of URLs and notifying someone when something has changed.☆32Jun 12, 2017Updated 8 years ago
- POLITICO's system for managing civic data☆20Dec 7, 2022Updated 3 years ago
- A simple command line interface to the datamade/dedupe library.☆43Dec 26, 2022Updated 3 years ago
- Add state and county fips codes to data☆43Sep 4, 2025Updated 5 months ago
- Analysis behind the "How the Cook County Assessor Failed Taxpayers"☆22Dec 6, 2017Updated 8 years ago
- pneumatic is a bulk-upload library for DocumentCloud.☆22Sep 6, 2020Updated 5 years ago
- Extract tables from PDF pages.☆299Jun 25, 2020Updated 5 years ago
- Camelot: PDF Table Extraction for Humans☆3,717Jan 5, 2023Updated 3 years ago
- Turn raw electronic FEC filings into meaningful data☆19May 20, 2016Updated 9 years ago
- A repository of journalist's lookup tables.☆107Apr 26, 2017Updated 8 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆460Aug 3, 2023Updated 2 years ago
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 9 years ago
- ☆14Jun 6, 2017Updated 8 years ago
- Nice and simple US state projections for D3☆27May 14, 2016Updated 9 years ago
- A web interface to extract tabular data from PDFs☆1,792Jan 3, 2025Updated last year
- A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.☆4,438Jul 29, 2025Updated 7 months ago
- A Python tool to help extracting information from structured PDFs.☆427Feb 23, 2026Updated last week
- Binary Python bindings for poppler utils for content extraction☆42May 12, 2021Updated 4 years ago
- A Python wrapper for the OpenFEC API.☆28Nov 12, 2019Updated 6 years ago
- Collecting various d3.js tricks☆12Sep 23, 2015Updated 10 years ago