witwall / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆21Updated 9 years ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- Plugin to use rich text in Annotator☆30Updated 10 years ago
- Easily explore, view and edit markdown documentation of a file tree☆66Updated last year
- Recipes for calibre☆69Updated 11 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated last year
- Jash - JavaScript Shell☆45Updated 4 years ago
- PdfJs-Annotator is a proof of concept project that integrates AnnotatorJs (http://annotatorjs.org/) with the PdfJs (https://mozilla.githu…☆25Updated 5 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- TagUI Editor for Browser Automation (chrome-firefox)☆58Updated last week
- ☆23Updated last year
- Adds read support for Excel files (xls and xlsx) to agate.☆17Updated 3 months ago
- Data Store for Annotation Studio☆46Updated 2 years ago
- A library for extracting tables from PDF files☆90Updated 4 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated this week
- ☆16Updated last year
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆24Updated 10 years ago
- Exports XMind Mindmap to any documents with Pandoc.☆32Updated 11 years ago
- Python client for Docverter service (pandoc as a service)☆17Updated 7 years ago
- Ideas for (tech) stuff to research, build or work on.☆50Updated 5 months ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- copy of pdftohtml code with enhancements☆25Updated last year
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆33Updated 9 years ago
- Scrapy pipeline which allows you to store scrapy items in appery.io database.☆14Updated 8 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Python library for manipulating Open Packaging Convention (OPC) files like .docx, .pptx, and .xslx☆46Updated 8 years ago
- HtmlClipper is a bookmarklet which lets you copy html sections of any web pages together with the attached css styles.☆67Updated 3 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Lacuna: Digital Annotation for Teaching and Learning☆37Updated 6 years ago