ashutoshvarma / pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
☆40Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for pyxpdf
- Python binding to Poppler-cpp pdf library☆98Updated 2 months ago
- Loadable spellfix1 extension for sqlite as python package☆25Updated 7 months ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆16Updated last year
- A Python tool to help extracting information from structured PDFs.☆383Updated 3 weeks ago
- A Python binding of SQLite Full Text Search Tokenizer☆46Updated last month
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 3 months ago
- Python API for PDF documents☆117Updated 2 months ago
- Python bindings for RocksDB☆32Updated 2 years ago
- Modern internal tools. Defined, controlled, and deployed directly from backend code. No JavaScript. Secure.☆20Updated 3 years ago
- Parse numbers written in natural language☆109Updated 3 weeks ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last month
- Detect textlines in document images☆90Updated 5 months ago
- Python interface to Apache PDFBox command-line tools.☆75Updated last year
- Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)☆17Updated 3 months ago
- ☆15Updated 3 years ago
- Python library for fast approximate string matching using Jaro and Jaro-Winkler similarity☆64Updated 10 months ago
- A fast native implementation of diff algorithm with a pure Python fallback☆38Updated last year
- Easy to use pattern matching and information extraction for Python☆38Updated last year
- A python module to split file into multiple chunks based on the given size.☆66Updated last month
- A simple python wrapper for PDFium.☆15Updated 2 years ago
- Build a trie-structured regular expression from a list of words☆21Updated 5 years ago
- Read big JSON files without consuming lots of memory☆18Updated 3 years ago
- uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding and attempts to determine the …☆38Updated 5 months ago
- timing functions, contexts and for-loops☆86Updated last month
- Python difflib with parts reimplemented in C☆32Updated 2 years ago
- Faster pathlib for Python☆52Updated 11 months ago
- An easy to use rpc framework for enabling fast inter-process, inter-container, or inter-host communication☆63Updated last year
- Docutils (a.k.a. reStructuredText, reST, RST) support for django☆11Updated this week
- Simple secure asynchronous message queue☆20Updated 4 months ago
- A simpler, faster ISO 639 library.☆34Updated last week