alephdata / pdflibLinks
Binary Python bindings for poppler utils for content extraction
☆42Updated 4 years ago
Alternatives and similar repositories for pdflib
Users that are interested in pdflib are comparing it to the libraries listed below
Sorting:
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated this week
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated this week
- A maximum-strength name parser for record linkage.☆37Updated last week
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 4 years ago
- How can we improve name matching in screening tools?☆13Updated 4 months ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Datasette plugin for visualizing data using Vega☆58Updated last year
- Render a map for any query with a geometry column☆26Updated 10 months ago
- International Address formatter which considers the standard formatting rules of the country☆26Updated 3 years ago
- Transform flat data structures into nested object graphs matching JSON schema definitions.☆28Updated 8 years ago
- Scalable String Similarity Joins in Python☆39Updated 11 months ago
- Extract, parse and populate templates from strings☆27Updated 6 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Add editing UI and other power-user features to Datasette.☆12Updated 2 years ago
- Basic cookiecutter template for Python projects☆21Updated 8 months ago
- Utility library to turn country names into ISO two-letter codes☆69Updated last week
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- A python module that will check for package updates.☆28Updated 3 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 10 years ago
- A Python library for defining rule-based overrides on messy data☆16Updated 2 months ago
- Web interface for network analysis.☆21Updated 2 years ago
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- Generate SQL tables, load and extract data, based on JSON Table Schema descriptors.☆62Updated last year
- this repo contains the draft, images, and code for the Medium blog post on altair themes.☆12Updated 6 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- A repository of materials for a proposed class on automated story bots.☆49Updated 6 years ago
- javascript multivariate data visualization☆14Updated 8 years ago