alephdata / pdflib
Binary Python bindings for poppler utils for content extraction
☆42Updated 3 years ago
Alternatives and similar repositories for pdflib:
Users that are interested in pdflib are comparing it to the libraries listed below
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- A maximum-strength name parser for record linkage.☆36Updated last week
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated 2 weeks ago
- Utility library to turn country names into ISO two-letter codes☆66Updated last month
- How can we improve name matching in screening tools?☆12Updated 2 months ago
- Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.☆52Updated 3 years ago
- LoadKit supports Extract, Transform, Load processes based on ArchiveKit buckets.☆11Updated 9 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 3 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 9 years ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- Python language parser for a tabular format for structured metadata. http://metatab.org☆18Updated last year
- A Python library for defining rule-based overrides on messy data☆13Updated 4 months ago
- Render a map for any query with a geometry column☆26Updated 8 months ago
- A browser user interface for manual labeling of record pairs.☆46Updated last year
- Inspect a URL and estimate if it contains a news story☆39Updated 4 months ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Date parsing and normalization utilities for Python.☆22Updated last year
- Commons of stupid, simple Python micro functions. Pull requests very welcome.☆19Updated this week
- Manage and load dataprotocols.org Data Packages☆27Updated 9 years ago
- A template for open-source Python repositories☆23Updated 2 weeks ago
- searching large heterogenous data dumps with Universal Sentence Encoder☆62Updated 3 years ago
- Generate SQL tables, load and extract data, based on JSON Table Schema descriptors.☆62Updated last year
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- agate-sql adds SQL read/write support to agate.☆18Updated last month
- International Address formatter which considers the standard formatting rules of the country☆26Updated 3 years ago
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago