maxpmaxp / pdfreader
Python API for PDF documents
☆117Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for pdfreader
- A Python tool to help extracting information from structured PDFs.☆383Updated 3 weeks ago
- Python binding to Poppler-cpp pdf library☆98Updated 2 months ago
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- mirror of https://hg.reportlab.com/hg-public/reportlab☆69Updated this week
- Convert html to docx☆74Updated 4 months ago
- A Python implementation of Lunr.js 🌖☆189Updated 3 weeks ago
- rstr is a helper module for easily generating random strings of various types. It could be useful for fuzz testing, generating dummy data…☆89Updated last year
- A pure Python Levenshtein implementation that's not freaking GPL'd.☆97Updated last year
- A modern CSS selector implementation for BeautifulSoup☆206Updated last month
- A Python library to extract tabular data from PDFs☆50Updated this week
- A utility to read and write PDFs with Python☆72Updated 4 months ago
- Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.☆49Updated 3 weeks ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆67Updated 3 weeks ago
- An open-source package for python to clean raw text data☆69Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆167Updated this week
- Fast and memory-efficient Python PDF Parser based on xpdf sources☆40Updated 11 months ago
- Bloom filter for Python☆39Updated 3 years ago
- XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml☆72Updated 3 weeks ago
- Demos, examples and utilities using PyMuPDF☆578Updated 4 months ago
- Atom, RSS and JSON feed parser for Python 3☆116Updated 2 years ago
- Parse numbers written in natural language☆109Updated last month
- Efficient string matching with regular expressions☆138Updated this week
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarity☆64Updated 10 months ago
- Common interface for data container classes☆62Updated this week
- ASCII transliterations of Unicode text - GitHub mirror☆531Updated 6 months ago
- Pure-python library for adding annotations to PDFs☆198Updated 3 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆99Updated 3 weeks ago
- Simple, Pythonic extraction of text, shapes and images from PDFs☆78Updated 4 years ago
- A Python binding of SQLite Full Text Search Tokenizer☆46Updated last month