maxpmaxp / pdfreader
Python API for PDF documents
β118Updated 4 months ago
Alternatives and similar repositories for pdfreader:
Users that are interested in pdfreader are comparing it to the libraries listed below
- A Python tool to help extracting information from structured PDFs.β391Updated 3 weeks ago
- Python binding to Poppler-cpp pdf libraryβ105Updated 4 months ago
- A Python implementation of Lunr.js πβ194Updated 3 weeks ago
- Python interface to Apache PDFBox command-line tools.β75Updated 2 years ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.β175Updated this week
- A purely-functional HTML builder for Python. Think JSX rather than templates.β94Updated 2 weeks ago
- Simple, Pythonic extraction of text, shapes and images from PDFsβ79Updated 4 years ago
- Pure-python library for adding annotations to PDFsβ199Updated 3 years ago
- A utility to read and write PDFs with Pythonβ334Updated 3 years ago
- An open-source package for python to clean raw text dataβ69Updated last year
- Append/Concatenate .docx documentsβ106Updated 6 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)β149Updated last year
- Demos, examples and utilities using PyMuPDFβ618Updated 6 months ago
- A light weight, zero dependency, minimal functionality excel read/writer python libraryβ306Updated last year
- Pandoc (Python Library)β146Updated 4 months ago
- ποΈ Highlight text in documentsβ99Updated last month
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarityβ65Updated last year
- Parse numbers written in natural languageβ109Updated 3 months ago
- Library for unit extraction - fork of quantulum for python3β135Updated 7 months ago
- Simplify DOCX files to JSONβ224Updated 4 months ago
- Fast and robust date extraction from web pages, with Python or on the command-lineβ121Updated 3 weeks ago
- rstr is a helper module for easily generating random strings of various types. It could be useful for fuzz testing, generating dummy dataβ¦β90Updated last year
- A python library to make filling pdfs much easierβ146Updated 5 months ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarityβ106Updated 3 weeks ago
- Quickly check whether there is a visible difference between two PDFs.β66Updated 2 months ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.β76Updated 3 years ago
- Python bindings to PDFiumβ493Updated this week
- 𧬠A VS Code extension for annotating data with Prodigyβ30Updated 3 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacingβ68Updated 3 weeks ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.β434Updated last year