Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated last month
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated last week
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago
- Apache POI builder☆54Updated 2 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆96Updated 3 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 6 years ago
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆189Updated 2 years ago
- Converts XHTML to OpenXML WordML (docx) using docx4j☆143Updated last month
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆209Updated last week
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆108Updated 3 months ago
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 9 years ago
- Office 365 client for Java☆51Updated 2 years ago
- The FSS(file storage service) APIs make storing the blob file easy and simple .☆41Updated 2 years ago
- API to throttle/rate-limit requests☆56Updated 9 years ago
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 8 years ago
- Java 17 Library with tons of utility classes required in all projects☆35Updated this week
- COPPER - a high performance Java workflow engine☆278Updated 2 weeks ago
- An extensible java library to create thumbnails of different file types (image, text)☆48Updated 3 years ago
- A Generic (n-ary) Tree implementation in Java☆104Updated 8 years ago
- Convert Word documents to simple and clean HTML☆275Updated this week
- XML manipulation library in Java built on a Fluent API☆30Updated last year
- Excel reporting library for JAVA☆45Updated 8 years ago
- (Java)A Method to Extract Tabular Content from PDF Files☆335Updated 2 years ago
- Brix CMS☆126Updated last year
- Neuro4j Workflow is a light-weight workflow engine for Java with Eclipse-based development environment. Workflow allows to build reusable…☆61Updated 6 years ago
- Aspirin is an embeddable send-only SMTP server.☆92Updated 4 years ago
- jMimeMagic is a Java library for determining the MIME type of files or streams.☆207Updated 3 years ago
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated 2 years ago
- Java JSON to XML converter☆66Updated 2 weeks ago