Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated 2 months ago
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated 2 weeks ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 7 years ago
- Java library to use xml-rpc functionality of Wordpress☆78Updated 4 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆96Updated 3 years ago
- Generic Excel File (XLSX) Reader using Apache POI☆32Updated 7 years ago
- jORM is a Lightweight Java ORM☆37Updated 6 years ago
- Apache POI builder☆54Updated 2 years ago
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 8 years ago
- Brix CMS☆126Updated last year
- ElasticSearch Java API tutorial using test cases.☆112Updated 3 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆108Updated 5 months ago
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆190Updated 3 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆211Updated 2 weeks ago
- Implementation of the new headless chrome with chromedriver and selenium.☆38Updated 6 years ago
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆130Updated 3 years ago
- This is a sample java web application that demonstrates the integration of Apache Mahout with a database driven Spring based application …☆42Updated 12 years ago
- Converts XHTML to OpenXML WordML (docx) using docx4j☆144Updated last week
- Java agent string parser based on Udger https://udger.com/products/local_parser☆26Updated 2 years ago
- Single file examples and ready-to-use servers show how to use parallec.io library. Examples to aggregate APIs and publish to Elastic Sear…☆92Updated 8 years ago
- Small set of tools allowing you to create secure encrypted tokens, which can be later exchanged with 3rd party systems or stored as a lic…☆82Updated 10 years ago
- QuartzDesk Executor (QE) is a scalable and generic job scheduling application that can be used to schedule execution of native shell scri…☆22Updated 10 months ago
- COPPER - a high performance Java workflow engine☆277Updated this week
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 9 years ago
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated 2 years ago
- Shiro webapp using the buji-pac4j bridge and the javaee-pac4j security library☆84Updated 3 weeks ago
- Java OCR allows you to perform OCR and bar code recognition on images (JPEG, PNG, TIFF, PDF, etc.) and output as plain text, xml with ful…☆136Updated 10 years ago
- Java EE Cache Filter☆36Updated 6 years ago