alephdata / pdflib
Binary Python bindings for poppler utils for content extraction
☆42Updated 3 years ago
Alternatives and similar repositories for pdflib:
Users that are interested in pdflib are comparing it to the libraries listed below
- Python wrapper for a C++ Double Metaphone☆15Updated last year
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- A maximum-strength name parser for record linkage.☆36Updated 5 months ago
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆15Updated last week
- A Python library for defining rule-based overrides on messy data☆13Updated last month
- Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.☆52Updated 3 years ago
- A browser user interface for manual labeling of record pairs.☆42Updated last year
- International Address formatter which considers the standard formatting rules of the country☆26Updated 3 years ago
- (Archived) A Python library for record linkage and deduplication.☆19Updated 9 months ago
- Extract networks of entities from journalistic reporting☆47Updated last year
- Datasette plugin for visualizing data using Vega☆58Updated last year
- Utility library to turn country names into ISO two-letter codes☆66Updated last month
- THIS REPOSITORY IS FORK☆30Updated last year
- Execute OpenRefine JSON scripts without OpenRefine (or Java)☆29Updated 2 years ago
- Transform flat data structures into nested object graphs matching JSON schema definitions.☆28Updated 8 years ago
- Set-oriented Operations in Pandas☆24Updated 4 years ago
- Adds read support for Excel files (xls and xlsx) to agate.☆17Updated 10 months ago
- Slideshow template for Voilà based on RevealJS☆16Updated 3 years ago
- Python language parser for a tabular format for structured metadata. http://metatab.org☆17Updated last year
- Just charts. Really.☆22Updated last year
- A repository of materials for a proposed class on automated story bots.☆49Updated 6 years ago
- Add website scraping abilities to Datasette☆62Updated last year
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆60Updated this week
- Dump (freeze) SQL query results from a database into a selection of file formats☆92Updated 5 years ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆23Updated 11 months ago
- A markdown wiki and dashboarding system for Datasette☆21Updated 3 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated last month
- Sidewall is a Python library for interacting with the Dimensions search API.☆17Updated 4 months ago