witwall / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆21Updated 10 years ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- see also section scraping on custom levels of depth☆87Updated 7 months ago
- Lacuna: Digital Annotation for Teaching and Learning☆37Updated 7 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 weeks ago
- Plugin to use rich text in Annotator☆30Updated 10 years ago
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆35Updated 9 years ago
- Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.☆133Updated 2 years ago
- Open Video Annotation Project☆111Updated 8 years ago
- Web data extraction tool implemented as chrome extension with much more features☆47Updated 6 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string☆36Updated 7 years ago
- Jash - JavaScript Shell☆45Updated 5 years ago
- A library for extracting tables from PDF files☆89Updated 11 years ago
- HTML5 Customizable Reader & Admin Console - Librelio Digital Publishing Suite☆29Updated 9 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆24Updated 10 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Tesseract documentation☆76Updated 4 years ago
- PdfJs-Annotator is a proof of concept project that integrates AnnotatorJs (http://annotatorjs.org/) with the PdfJs (https://mozilla.githu…☆25Updated 5 years ago
- Cytoscape 3 desktop version.☆17Updated last month
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- A tree diagram (SVG) generator.☆83Updated 2 years ago
- Publishing Framework for Large-Scale Data-Rich Interactive Web Pages☆179Updated 4 years ago
- ☆49Updated 11 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Python client for Docverter service (pandoc as a service)☆17Updated 7 years ago
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 5 months ago
- Manifests of the public domain images uploaded to Flickr Commons, with descriptive information about the books they were taken from.☆75Updated 11 years ago
- Semantic data wiki as well as Linked Data publishing engine☆206Updated last year
- Python library for manipulating Open Packaging Convention (OPC) files like .docx, .pptx, and .xslx☆46Updated 8 years ago
- Extract tables from PDF pages.☆296Updated 5 years ago
- Execute OpenRefine JSON scripts without OpenRefine (or Java)☆30Updated 2 years ago