okfn / pdftables
A library for extracting tables from PDF files
☆90Updated 11 years ago
Related projects ⓘ
Alternatives and complementary repositories for pdftables
- Python library with common functionality for writing web scrapers☆102Updated 9 years ago
- Extract tables from PDF pages.☆276Updated 4 years ago
- Python library and command line tool for converting data from one format to another☆100Updated 4 years ago
- A library for extracting tables from PDF files☆87Updated 4 years ago
- Creating Rickshaw.js visualizations with Python Pandas☆266Updated 8 years ago
- Extract tables from PDF files☆354Updated 8 years ago
- Parser and standardizer for politician, individual and organization names.☆128Updated 7 years ago
- Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py☆390Updated last year
- Find which links on a web page are pagination links☆29Updated 7 years ago
- Modularly extensible semantic metadata validator☆83Updated 8 years ago
- A polite, minimal interface for sending python objects to and from Amazon S3.☆57Updated 8 years ago
- The OpenRefine Python Client Library provides an interface to communicating with an OpenRefine server.☆176Updated 5 years ago
- ScraperWiki Python library for scraping and saving data☆159Updated last year
- legacy backend for Open States☆87Updated 4 years ago
- An expandable and scalable OCR pipeline☆86Updated 6 years ago
- A (comprehensive) collection of open source tools used by the data community.☆51Updated 8 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆92Updated 2 years ago
- DEPRECATED - name_tools for Open States and other projects☆19Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- A data processing pipeline that schedules and runs content harvesters, normalizes their data, and outputs that normalized data to a varie…☆41Updated 8 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated 7 months ago
- A python library detect and extract listing data from HTML page.☆109Updated 7 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated last year
- A simple Python library/tool for pulling location information from unstructured text☆185Updated 13 years ago
- Scrapes sites. Gets news. Eventually events.☆81Updated 8 years ago
- Python library for creating word clouds from text☆51Updated 5 years ago
- Docker container to provide Apache Tika RESTful API☆40Updated 8 years ago
- Street address parser and formatter☆92Updated 5 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Updated 3 years ago
- Python 3 AsyncIO powered scraping framework with batteries included☆20Updated 8 years ago