maxpmaxp / pdfreaderLinks
Python API for PDF documents
☆124Updated last year
Alternatives and similar repositories for pdfreader
Users that are interested in pdfreader are comparing it to the libraries listed below
Sorting:
- A Python tool to help extracting information from structured PDFs.☆427Updated last week
- Python binding to Poppler-cpp pdf library☆114Updated last year
- python library to simplify working with jsonlines and ndjson data☆306Updated last year
- A Python implementation of Lunr.js 🌖☆202Updated 9 months ago
- Python interface to Apache PDFBox command-line tools.☆78Updated 2 years ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆196Updated last week
- A utility to read and write PDFs with Python☆338Updated 4 years ago
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarity☆76Updated last year
- Python Simple Object Storage - provides a list and dictionary interface that seamlessly stores data in a file, like a simplified database…☆58Updated 2 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆122Updated last month
- mirror of https://hg.reportlab.com/hg-public/reportlab☆77Updated last week
- Simplify DOCX files to JSON☆257Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆155Updated 2 years ago
- Parse numbers written in natural language☆124Updated last year
- Extract dates from text☆66Updated 4 years ago
- Python library that reads JSON files of any size.☆196Updated 2 years ago
- Pythonic search engine based on PyLucene.☆131Updated this week
- A purely-functional HTML builder for Python. Think JSX rather than templates.☆102Updated 11 months ago
- Pandoc (Python Library)☆176Updated 2 months ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆68Updated 3 years ago
- Simple python wrapper to convert HTML to PDF with headless Chrome via selenium☆74Updated last week
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆77Updated 3 weeks ago
- A Python binding of SQLite Full Text Search Tokenizer☆48Updated last month
- Simple, Pythonic extraction of text, shapes and images from PDFs☆80Updated 5 years ago
- An open-source package for python to clean raw text data☆73Updated 2 years ago
- rstr is a helper module for easily generating random strings of various types. It could be useful for fuzz testing, generating dummy data…☆98Updated 10 months ago
- Efficient string matching with regular expressions☆146Updated last week
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆68Updated 2 years ago
- Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.☆56Updated 11 months ago
- CyDifflib is a fast implementation of difflib's algorithms, which can be used as a drop-in replacement.☆32Updated 8 months ago