witwall / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆21Updated 10 years ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- A library for extracting tables from PDF files☆92Updated 5 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 weeks ago
- Sublime Text plugin for easier cursor navigation of XML and HTML files using XPath 1.0.☆45Updated last year
- A natural language date parser. (Python version of chrono.js)☆25Updated 6 months ago
- Tesseract documentation☆75Updated 4 years ago
- ☆80Updated 2 years ago
- Plugin to use rich text in Annotator☆30Updated 11 years ago
- HTML5 Customizable Reader & Admin Console - Librelio Digital Publishing Suite☆29Updated 10 years ago
- A pair of scripts to download videos and subtitles for the TED Talks (http://www.ted.com)☆42Updated 11 years ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆35Updated 10 years ago
- Freemind to Markdown Converter☆49Updated 8 years ago
- 1-Click to make ePUB, MOBI, PDF with Word Addin☆28Updated 11 years ago
- see also section scraping on custom levels of depth☆89Updated 10 months ago
- A dynamic media input form developed for oTranscribe☆18Updated 10 years ago
- Tesseract Powered Windows Desktop OCR Application With Multiple Pre/Post Processing GUI☆41Updated last year
- Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.☆133Updated 2 years ago
- Create, read, and modify Excel .xlsx files☆113Updated 5 years ago
- BIBFRAME Datastore is a Linked-Data project for managing bibliographic records and operational data focused on libraries and other simila…☆16Updated 10 years ago
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated 2 years ago
- Chrome extension for XPaths operations done the right way.☆44Updated 6 years ago
- Chrome extension to select and copy table cells.☆138Updated 2 years ago
- Exports XMind Mindmap to any documents with Pandoc.☆32Updated 12 years ago
- Recipes for calibre☆69Updated 12 years ago
- Artificial Intelligence Knowledge Information Framework☆55Updated 2 years ago
- Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string☆36Updated 7 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆26Updated 10 years ago
- Linguistic search for large annotated text corpora, based on Apache Lucene☆117Updated this week
- Python/Flask-based website for text analysis workflow. Previous (stable) release is live at:☆122Updated last year
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 8 months ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago