okfn / pdftables
A library for extracting tables from PDF files
☆90Updated 11 years ago
Alternatives and similar repositories for pdftables:
Users that are interested in pdftables are comparing it to the libraries listed below
- Extract tables from PDF pages.☆283Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Modularly extensible semantic metadata validator☆83Updated 9 years ago
- SolrClient is a simple python library for Solr; built in python3 with support for latest features of Solr.☆63Updated 5 years ago
- pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…☆100Updated 5 years ago
- The OpenRefine Python Client Library provides an interface to communicating with an OpenRefine server.☆175Updated 5 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 7 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Python bindings to the Tesseract API☆66Updated 8 years ago
- Structured Data from PDF image-based files☆88Updated 11 years ago
- A Python library for extracting semantic information from text, such as dates and numbers.☆75Updated 2 years ago
- Dynamic data analysis over the web. The logic to your data dashboards.☆156Updated 10 years ago
- Python bindings to the Compact Language Detector☆33Updated 4 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆108Updated 11 years ago
- A polite, minimal interface for sending python objects to and from Amazon S3.☆57Updated 8 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Updated 6 years ago
- General Architecture for Text Engineering☆48Updated 8 years ago
- Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py☆388Updated last year
- This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet…☆29Updated 2 months ago
- Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms☆130Updated 8 years ago
- ☆50Updated last year
- Navigating around a grid of cells like XPath for spreadsheets; supports Python 3.5+☆47Updated 2 years ago
- scraper related helper functions☆27Updated 10 years ago
- A Python bot that edits Wikipedia and interacts over IRC☆40Updated 3 months ago
- Data analysis tool.☆84Updated last year
- legacy backend for Open States☆87Updated 5 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆94Updated 2 years ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆147Updated last month