witwall / pdf2htmlEXLinks
Convert PDF to HTML without losing text or format.
☆21Updated 9 years ago
Alternatives and similar repositories for pdf2htmlEX
Users that are interested in pdf2htmlEX are comparing it to the libraries listed below
Sorting:
- Plugin to use rich text in Annotator☆30Updated 10 years ago
- visualization using Wikidata data☆7Updated 10 months ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆24Updated 10 years ago
- An online sentiment analyzer built with Flask and TextBlob☆15Updated 11 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- A simple Web crawler for stackshare.io using scrapy .☆9Updated 6 years ago
- Jash - JavaScript Shell☆45Updated 4 years ago
- ☆38Updated 9 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- ☆23Updated last year
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Offline storage for the Annotator☆43Updated 7 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 months ago
- Java program to add bookmarks to pdf (stable)☆27Updated 4 years ago
- Near-duplicate detection tool☆24Updated 8 years ago
- A library for extracting tables from PDF files☆89Updated 4 years ago
- GUI text-based speech and music editor for creating radio/audio stories☆77Updated 2 years ago
- Wikidata properties☆9Updated last year
- Sauna - a social news reader and curation tool☆55Updated 10 years ago
- Adds read support for Excel files (xls and xlsx) to agate.☆17Updated 3 months ago
- HTML5 Customizable Reader & Admin Console - Librelio Digital Publishing Suite☆29Updated 9 years ago
- PageArchiver (previously called "Scrapbook for SingleFile") is a Chrome extension that helps to archive pages for offline reading☆88Updated 12 years ago
- Chrome extension that sends your tabs to sleep - like OneTab but without removing them from the tab bar☆77Updated 7 years ago
- HtmlClipper is a bookmarklet which lets you copy html sections of any web pages together with the attached css styles.☆67Updated 3 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- Parsing and extracting information from (possibly malformed) HTML/XML documents☆10Updated last year
- Batch convert PDF files to text under Windows, using several text extraction methods or OCR☆33Updated 9 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago