Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆34Updated this week
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆191Updated this week
- Apache POI builder☆54Updated 2 years ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆78Updated 2 years ago
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆85Updated last year
- Java library to use xml-rpc functionality of Wordpress☆77Updated 4 years ago
- Java EE Cache Filter☆36Updated 6 years ago
- Generic Excel File (XLSX) Reader using Apache POI☆32Updated 7 years ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago
- COPPER - a high performance Java workflow engine☆276Updated last year
- Java 11 Library with tons of utility classes required in all projects☆34Updated this week
- Powerful, hierachical based desktop search engine based on swing and lucene.☆18Updated 8 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆107Updated 2 months ago
- A Generic (n-ary) Tree implementation in Java☆103Updated 8 years ago
- A web platform to rapidly build forms for data management and business automation.☆184Updated this week
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 8 years ago
- Brix CMS☆126Updated last year
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆186Updated 2 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆244Updated last week
- Convert Word documents to simple and clean HTML☆270Updated last month
- YaHP is a Java library that allows you to convert an HTML document into a PDF document.☆56Updated 13 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆95Updated 3 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 6 years ago
- Json to Java source code generator for Jackson (see the wiki https://github.com/astav/JsonToJava/wiki/JsonToJava)☆96Updated 12 years ago
- Java agent string parser based on Udger https://udger.com/products/local_parser☆26Updated 2 years ago
- Mirror of the SqlBuilder project: http://openhms.sourceforge.net/sqlbuilder/☆171Updated 4 years ago
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆130Updated 3 years ago
- Converts XHTML to OpenXML WordML (docx) using docx4j☆145Updated last week
- ElasticSearch Java API tutorial using test cases.☆112Updated 3 years ago
- A bundle of html content extraction algorithms☆122Updated 10 years ago
- Java GUI and Tools for Tesseract OCR☆331Updated last year