alephdata / pdflib
Binary Python bindings for poppler utils for content extraction
☆42Updated 3 years ago
Alternatives and similar repositories for pdflib:
Users that are interested in pdflib are comparing it to the libraries listed below
- A Datasette plugin providing an MLOps platform to train, eval and predict machine learning models☆16Updated last month
- Provide partial dates and retain the date precision through processing☆13Updated 2 years ago
- Python wrapper for a C++ Double Metaphone☆15Updated 2 years ago
- A maximum-strength name parser for record linkage.☆37Updated this week
- Inspect a URL and estimate if it contains a news story☆39Updated 5 months ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- agate-sql adds SQL read/write support to agate.☆18Updated 2 months ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 3 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Render a map for any query with a geometry column☆26Updated 8 months ago
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- Scalable String Similarity Joins in Python☆39Updated 9 months ago
- Python language parser for a tabular format for structured metadata. http://metatab.org☆18Updated last year
- Generate Pandas frames, load and extract data, based on JSON Table Schema descriptors.☆52Updated 3 years ago
- Docker Container for a Make-based, PDF extraction using OCR☆12Updated 9 months ago
- Python library and command line tool for converting data from one format to another☆99Updated 4 years ago
- A tool for telling stories with maps.☆27Updated 7 months ago
- A repository of materials for a proposed class on automated story bots.☆49Updated 6 years ago
- Generate SQL tables, load and extract data, based on JSON Table Schema descriptors.☆62Updated last year
- Creating user interfaces for data science with Jupyter widgets☆11Updated 7 years ago
- Utility library to turn country names into ISO two-letter codes☆66Updated 2 months ago
- Auto-generate Python APIs from JSON schema specifications☆80Updated 5 years ago
- Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.☆80Updated last year
- Basic cookiecutter template for Python projects☆21Updated 7 months ago
- Transform flat data structures into nested object graphs matching JSON schema definitions.☆28Updated 8 years ago
- Write Datasette canned queries as plain SQL files☆13Updated 2 years ago
- A browser user interface for manual labeling of record pairs.☆47Updated last year
- A simple command line interface to the datamade/dedupe library.☆42Updated 2 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 10 years ago
- Datasette plugin for serving media based on a SQL query☆18Updated 2 years ago