maxpmaxp / pdfreaderLinks
Python API for PDF documents
☆124Updated 11 months ago
Alternatives and similar repositories for pdfreader
Users that are interested in pdfreader are comparing it to the libraries listed below
Sorting:
- A Python tool to help extracting information from structured PDFs.☆411Updated 2 weeks ago
- Python binding to Poppler-cpp pdf library☆111Updated 11 months ago
- A Python implementation of Lunr.js 🌖☆199Updated 5 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆189Updated this week
- python library to simplify working with jsonlines and ndjson data☆297Updated last year
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarity☆73Updated last year
- Parse numbers written in natural language☆122Updated 10 months ago
- Pandoc (Python Library)☆163Updated 11 months ago
- Efficient string matching with regular expressions☆145Updated last week
- Python interface to Apache PDFBox command-line tools.☆77Updated 2 years ago
- A utility to read and write PDFs with Python☆337Updated 3 years ago
- Efficient Trie-based regex unions for blacklist/whitelist filtering and one-pass mapping-based string replacing☆74Updated last month
- Simplify DOCX files to JSON☆248Updated 11 months ago
- Pythonic search engine based on PyLucene.☆129Updated this week
- Simple python wrapper to convert HTML to PDF with headless Chrome via selenium☆74Updated 7 months ago
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆212Updated 2 weeks ago
- A purely-functional HTML builder for Python. Think JSX rather than templates.☆100Updated 7 months ago
- A python package to simulate typographical errors.☆37Updated last year
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆154Updated 2 years ago
- A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any othe…☆68Updated 2 years ago
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆117Updated 5 months ago
- URL normalization for Python☆97Updated 4 months ago
- A python based HTML to text conversion library, command line client and Web service.☆315Updated 3 weeks ago
- Find parts of long text or data, allowing for some changes/typos.☆328Updated 3 months ago
- Extract dates from text☆65Updated 4 years ago
- A Python library for working with and comparing language codes.☆346Updated 3 months ago
- Easy rate-limiting for python requests☆107Updated last week
- Pure-python library for adding annotations to PDFs☆206Updated 4 years ago
- A modern CSS selector implementation for BeautifulSoup☆245Updated 3 weeks ago
- The most basic Text::Unidecode port (licensed under Artistic License or GPL or GPLv2+ - choose whatever you want)☆67Updated 2 years ago