ashutoshvarma / pyxpdfLinks
Fast and memory-efficient Python PDF Parser based on xpdf sources
☆42Updated last year
Alternatives and similar repositories for pyxpdf
Users that are interested in pyxpdf are comparing it to the libraries listed below
Sorting:
- Python API for PDF documents☆124Updated last year
- Python binding to Poppler-cpp pdf library☆113Updated last year
- The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity☆120Updated 7 months ago
- Fastest general-purpose parsing library for Python with a familiar API☆48Updated 4 months ago
- A Python tool to help extracting information from structured PDFs.☆417Updated 2 weeks ago
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarity☆74Updated last year
- Python difflib with parts reimplemented in C☆40Updated 9 months ago
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆221Updated this week
- Safely evaluate AST nodes without side effects☆47Updated last year
- Parse numbers written in natural language☆123Updated last year
- A Python binding of SQLite Full Text Search Tokenizer☆48Updated last week
- A general purpose PDF text-layer redaction tool for Python 2/3.☆204Updated last year
- Pandoc (Python Library)☆173Updated 3 weeks ago
- Pure Python cross-platform pyclean. Clean up your Python bytecode.☆78Updated last week
- A fast RLock implementation for CPython☆30Updated 10 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆192Updated last week
- A Python implementation of Lunr.js 🌖☆200Updated 7 months ago
- XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml☆86Updated last week
- Efficient string matching with regular expressions☆145Updated this week
- mirror of https://hg.reportlab.com/hg-public/reportlab☆74Updated 3 weeks ago
- Memoization for python functions (based on Flask-Cache)☆30Updated last month
- MetaDict is a powerful dict subclass enabling (nested) attribute-style item access/assignment and IDE autocompletion support.☆37Updated 3 months ago
- Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.☆55Updated 9 months ago
- ☆45Updated last year
- Find parts of long text or data, allowing for some changes/typos.☆332Updated 5 months ago
- Python JSON benchmarking and "correctness".☆35Updated 2 years ago
- A simple python wrapper for PDFium.☆17Updated 3 years ago
- A high performance python hash table library that is generally faster and consumes significantly less memory than Python Dictionaries☆214Updated 2 years ago
- Pure python implementation of identifying files based off their magic numbers☆215Updated 3 months ago
- Visual Automata is a Python 3 library built as a wrapper for the Automata library to add more visualization features.☆57Updated 2 years ago