radkovo / Pdf2Dom
Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for converting the PDF documents to HTML is included in the distribution package. Pdf2Dom may be also used as an independent Java library with a standard DOM …
☆180Updated 2 years ago
Alternatives and similar repositories for Pdf2Dom:
Users that are interested in Pdf2Dom are comparing it to the libraries listed below
- Test area for public PDFBox v2 issues on stackoverflow etc☆84Updated 5 months ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆71Updated last year
- documents4j is a Java library for converting documents into another document format☆563Updated 6 months ago
- edit a docx using CKEditor via XHTML round trip (with some session state)☆47Updated 7 years ago
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆463Updated 2 years ago
- jStyleParser is a CSS parser written in Java. It has its own application interface that is designed to allow an efficient CSS processing …☆94Updated last month
- pdfHTML is an iText add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, sea…☆238Updated this week
- Library for performing the comparison operations between texts☆85Updated 4 years ago
- Convert Word documents to simple and clean HTML☆259Updated last month
- Converts XHTML to OpenXML WordML (docx) using docx4j☆141Updated 6 months ago
- Java wrapper for Ghostscript C API + PS/PDF document handling API☆65Updated last year
- Java font converter library.☆45Updated 5 months ago
- Export docx to PDF via XSL FO, using FOP☆46Updated 11 months ago
- CSSBox is an (X)HTML/CSS rendering engine written in pure Java. Its primary purpose is to provide a complete information about the render…☆243Updated last month
- Type-safe Java/COM binding☆146Updated last year
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 7 years ago
- JPEG2000 support for Java Advanced Imaging Image I/O Tools API☆76Updated last year
- Java JNA Wrapper for Leptonica Image Processing Library☆29Updated last week
- JAI ImageIO Core (without javax.media.jai dependencies)☆236Updated last year
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated last year
- Hunspell library for Java based on JNA☆62Updated last year
- Automatically exported from code.google.com/p/java-html2image☆136Updated last year
- Apache XML Graphics FOP☆189Updated this week
- A simple, high-performance, small memory footprint, pull based XML parser☆34Updated 7 years ago
- Web Browser, Flash Player, HTML editor, Media player for Swing☆196Updated last year
- Visual comparison of HTML in Java☆80Updated 5 months ago
- Java GUI and Tools for Tesseract OCR☆327Updated last year
- A Java wrapper around the PhantomJS binaries including a packaged HTML to PDF render script☆52Updated 6 years ago
- A Java ImageIO plugin for the JBIG2 bi-level image format☆32Updated 2 years ago
- Java servlet that provides an implementation of the webdav protocol. Underlying data-storage (database, custom file systems) can be easil…☆55Updated 3 years ago