Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated 2 months ago
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆194Updated 2 weeks ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆108Updated 5 months ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆96Updated 3 years ago
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- Converts XHTML to OpenXML WordML (docx) using docx4j☆144Updated last week
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 6 years ago
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 9 years ago
- Apache POI builder☆54Updated 2 years ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆211Updated last week
- Office 365 client for Java☆51Updated 2 years ago
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆190Updated 3 years ago
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆130Updated 3 years ago
- Java EE Cache Filter☆36Updated 6 years ago
- Neuro4j Workflow is a light-weight workflow engine for Java with Eclipse-based development environment. Workflow allows to build reusable…☆62Updated 6 years ago
- A Generic (n-ary) Tree implementation in Java☆104Updated 8 years ago
- Code materials for ADF Faces Cookbook☆14Updated 5 years ago
- Qzui, a REST and Web front end over Quartz Scheduler☆72Updated 9 years ago
- Provides a capability to tail log files in a web browser, implemented using websockets.☆63Updated 10 years ago
- Json to Java source code generator for Jackson (see the wiki https://github.com/astav/JsonToJava/wiki/JsonToJava)☆98Updated 12 years ago
- YaHP is a Java library that allows you to convert an HTML document into a PDF document.☆56Updated 13 years ago
- Brix CMS☆126Updated last year
- COPPER - a high performance Java workflow engine☆277Updated this week
- Implementation of the new headless chrome with chromedriver and selenium.☆38Updated 6 years ago
- Constellio 8☆23Updated 4 years ago
- Celerio is a code generator tool for data-driven application.☆82Updated 8 years ago
- OpenL Tablets Business Rules Management System☆179Updated last week
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated 2 years ago
- Beautiful and interactive javascript charts for Java-based web applications.☆91Updated last year