booktype / python-ooxml
Python library for parsing .docx (Office Open XML) files
☆52Updated 4 years ago
Related projects: ⓘ
- Python library for manipulating Open Packaging Convention (OPC) files like .docx, .pptx, and .xslx☆42Updated 7 years ago
- Python 3 port of pdfminer☆188Updated 6 years ago
- An extendable docx file format parser and converter☆185Updated 3 years ago
- Fast multi-keyword search engine for text strings☆248Updated last week
- Python bindings for CHMLIB☆55Updated 10 months ago
- A fast, pure-Python, untyped, in-memory database engine, using Python syntax to manage data, instead of SQL, inspired by PyDbLite.☆20Updated 6 years ago
- High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python☆94Updated last year
- Create, read, and modify Excel .xlsx files☆103Updated 3 years ago
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆244Updated 7 months ago
- Python workflow engine☆59Updated 3 years ago
- Python extension module for accelerating regular expressions using libesm☆132Updated 11 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆158Updated this week
- A simple python wrapper for PDFium.☆15Updated 2 years ago
- Python to JavaScript translator☆93Updated 7 years ago
- Python bindings for lib7zip☆34Updated 4 years ago
- A pure python based utility to extract text and images from docx files.☆504Updated 11 months ago
- XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml☆67Updated last week
- A utility to read and write pdfs with Python. Superseded: see https://github.com/knowah/PyPDF2☆80Updated 8 years ago
- Constants used in Chinese text processing☆355Updated last year
- Pure python Aho-Corasick library.☆209Updated last year
- 🐍 A CPython extension for the Hyperscan regular expression matching library.☆165Updated 6 months ago
- An efficient simhash implementation for python☆124Updated 4 years ago
- Python CFFI wrapper for LibreOfficeKit☆54Updated 4 years ago
- Un/packs an MHT (MHTML) archive into/from separate files, writing/reading them in directories to match their Content-Location.☆79Updated 2 years ago
- A Python tool to help extracting information from structured PDFs.☆368Updated 3 weeks ago
- pdf to markdown with Python3☆11Updated 4 years ago
- Python binding to libpoppler-qt5☆42Updated 10 months ago
- A few useful functions and objects for manipulating ip addresses in python.☆70Updated 4 years ago
- Convert a docx (OOXML) file to html. This project is deprecated in favor of https://github.com/OpenScienceFramework/pydocx☆44Updated 10 years ago
- Python binding to libpoppler with focus on text extraction☆98Updated 2 years ago