witwall / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆21Updated 10 years ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- Python library for manipulating Open Packaging Convention (OPC) files like .docx, .pptx, and .xslx☆47Updated 8 years ago
- An extendable docx file format parser and converter☆193Updated 6 months ago
- Recipes for calibre☆69Updated 12 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated this week
- HTML5 Customizable Reader & Admin Console - Librelio Digital Publishing Suite☆29Updated 10 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆96Updated 3 years ago
- A library for extracting tables from PDF files☆89Updated 12 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Create, read, and modify Excel .xlsx files☆113Updated 5 years ago
- A natural language date parser. (Python version of chrono.js)☆25Updated 6 months ago
- Cytoscape 3 desktop version.☆17Updated last month
- 1-Click to make ePUB, MOBI, PDF with Word Addin☆28Updated 11 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆25Updated 10 years ago
- A library for extracting tables from PDF files☆92Updated 5 years ago
- Linguistic search for large annotated text corpora, based on Apache Lucene☆116Updated this week
- Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.☆133Updated 2 years ago
- Tesseract documentation☆75Updated 4 years ago
- Diagramo - pure HTML5 JavaScript diagram / flowchart editor☆543Updated 3 years ago
- The OneNote Web Clipper extension☆346Updated last month
- Python client for Docverter service (pandoc as a service)☆17Updated 7 years ago
- vsdx - A python library for processing .vsdx files☆88Updated last year
- clone of docfetcher from sourceforge☆63Updated 11 years ago
- Extract tables from PDF pages.☆298Updated 5 years ago
- ☆14Updated 3 years ago
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated 2 years ago
- ☆23Updated 2 years ago
- Various Annotorious plugins that add additional image selection tools☆21Updated 7 years ago
- Terminology management web platform☆49Updated 3 years ago
- code cola is a chrome extension for editing online pages' css style visually.☆104Updated last year
- A powerful Python library for parsing media metadata, which can extract metadata (such as id3 tags, for example) from a wide range of med…☆23Updated last month