Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated 5 months ago
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆196Updated 3 weeks ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆97Updated 3 years ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- Java library to use xml-rpc functionality of Wordpress☆78Updated 4 years ago
- Brix CMS☆127Updated last year
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆102Updated 7 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆213Updated 3 weeks ago
- Tiny License Framework for Java☆69Updated 7 years ago
- Shiro webapp using the buji-pac4j bridge and the javaee-pac4j security library☆85Updated 2 weeks ago
- ElasticSearch Java API tutorial using test cases.☆112Updated 4 years ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 11 years ago
- Converts XHTML to OpenXML WordML (docx) using docx4j☆147Updated 3 months ago
- JD eSurvey is an open source enterprise survey web application written in Java and based on the Spring Framework. Check out the tutorial …☆232Updated 4 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆107Updated 8 months ago
- YaHP is a Java library that allows you to convert an HTML document into a PDF document.☆55Updated 14 years ago
- Automatically exported from code.google.com/p/java-image-scaling☆131Updated 2 years ago
- This is a simple code generator to scaffold the spring data jdbc dao classes☆34Updated 9 years ago
- Apache POI builder☆54Updated 2 years ago
- Convert Word documents to simple and clean HTML☆284Updated 2 months ago
- Single file examples and ready-to-use servers show how to use parallec.io library. Examples to aggregate APIs and publish to Elastic Sear…☆92Updated 8 years ago
- Some examples of using JDBI as a persistence framework☆42Updated 10 years ago
- Java cloud based CMS library for dynamic content.☆34Updated 9 years ago
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆129Updated 3 years ago
- Adds line-breaking, page-breaking, tables, and styles to PDFBox☆47Updated 2 years ago
- Formio, form definition and binding library for Java platform☆28Updated last month
- An easy-to-implement library for the GeoHash algorithm☆67Updated 4 years ago
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- [RETIRED] Open source e-commerce and marketplaces made simple on the JVM☆175Updated 5 years ago
- jORM is a Lightweight Java ORM☆38Updated 7 years ago
- QuartzDesk Executor (QE) is a scalable and generic job scheduling application that can be used to schedule execution of native shell scri…☆23Updated last month