Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated 4 months ago
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- Apache POI builder☆54Updated 2 years ago
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 11 years ago
- YaHP is a Java library that allows you to convert an HTML document into a PDF document.☆55Updated 14 years ago
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 9 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆97Updated 3 years ago
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated 2 years ago
- Shiro webapp using the buji-pac4j bridge and the javaee-pac4j security library☆85Updated last week
- Small set of tools allowing you to create secure encrypted tokens, which can be later exchanged with 3rd party systems or stored as a lic…☆82Updated 10 years ago
- Tiny License Framework for Java☆69Updated 7 years ago
- Implementation of the new headless chrome with chromedriver and selenium.☆38Updated 6 years ago
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆129Updated 3 years ago
- Java EE Cache Filter☆36Updated 6 years ago
- An Java Backend for jQuery-QueryBuilder☆62Updated 7 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆107Updated 6 months ago
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆191Updated this week
- Java cloud based CMS library for dynamic content.☆34Updated 9 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆212Updated last month
- Expression Engine in Java for Excel/ Google spreadsheet style formulas☆32Updated 10 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆197Updated this week
- Brix CMS☆127Updated last year
- Java library for receiving chunked file uploads☆12Updated 10 years ago
- Excel reporting library for JAVA☆45Updated 9 years ago
- jStyleParser is a CSS parser written in Java. It has its own application interface that is designed to allow an efficient CSS processing …☆95Updated 3 weeks ago
- Export docx to PDF via XSL FO, using FOP☆48Updated last year
- Powerful, hierachical based desktop search engine based on swing and lucene.☆18Updated 8 years ago
- jVoiD - eXtensible Java, Spring MVC, Hibernate eCommerce Shopping Cart☆30Updated 10 years ago
- API to throttle/rate-limit requests☆56Updated 9 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 7 years ago