Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆34Updated last month
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆72Updated 2 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆190Updated last month
- Implementation of Norconex Committer for Elasticsearch.☆11Updated 3 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆106Updated last month
- Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTM…☆185Updated 2 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 6 years ago
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆22Updated 9 months ago
- Apache POI builder☆54Updated 2 years ago
- A Java Library that interfaces with GNU Gettext and Java i18n Facilities to Make i18n Easier☆31Updated 6 years ago
- Convert Java lambdas to SQL statements. Build type safe and readable queries.☆65Updated 5 years ago
- Simple web pivot table using Apache Wicket☆25Updated last year
- Chromium-based headless browser for java☆28Updated 8 years ago
- Small set of tools allowing you to create secure encrypted tokens, which can be later exchanged with 3rd party systems or stored as a lic…☆82Updated 9 years ago
- Expression Engine in Java for Excel/ Google spreadsheet style formulas☆32Updated 9 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆207Updated last month
- Office 365 client for Java☆49Updated 2 years ago
- Neuro4j Workflow is a light-weight workflow engine for Java with Eclipse-based development environment. Workflow allows to build reusable…☆60Updated 6 years ago
- This is a clone of an SVN repository at http://anonsvn.icesoft.org/repo/icepdf/trunk/icepdf. It had been cloned by http://svn2github.com/…☆13Updated 7 years ago
- Provides simplified access to the ElasticSearch Java API.☆4Updated 4 years ago
- The SQL Processor is an engine producing the ANSI SQL statements and providing their execution without the necessity to write Java plumbi…☆28Updated last month
- A web-based application for authoring BPMN 2.0 process specifications☆15Updated 9 years ago
- JDBC driver for CSV☆70Updated 7 years ago
- A dynamic SQL builder for Java language.☆72Updated last month
- jStyleParser is a CSS parser written in Java. It has its own application interface that is designed to allow an efficient CSS processing …☆95Updated 7 months ago
- Library for performing the comparison operations between texts☆85Updated 4 years ago
- A unique ID generator that specialises in small IDs.☆54Updated last month
- Java implementation of most Excel formula functions.☆37Updated 3 years ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆95Updated 3 years ago
- jORM is a Lightweight Java ORM☆37Updated 6 years ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago