Norconex / importer
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆32Updated last year
Related projects: ⓘ
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆21Updated last year
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆181Updated this week
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 8 years ago
- Implementation of Norconex Committer for Elasticsearch.☆11Updated 2 years ago
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 9 years ago
- Apache POI builder☆53Updated last year
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆214Updated last year
- Java library for the Google Finance Historical Prices API☆19Updated 11 years ago
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 7 years ago
- The SQL Processor is an engine producing the ANSI SQL statements and providing their execution without the necessity to write Java plumbi…☆27Updated 5 months ago
- Generic Excel File (XLSX) Reader using Apache POI☆32Updated 6 years ago
- Json to Java source code generator for Jackson (see the wiki https://github.com/astav/JsonToJava/wiki/JsonToJava)☆94Updated 11 years ago
- Small application which allows to repeatedely replace markers in a Microsoft Word document with items taken from a CSV/Microsoft Excel fi…☆34Updated last month
- Enterprise backend as a service☆69Updated 5 years ago
- XML manipulation library in Java built on a Fluent API☆29Updated 10 months ago
- Neuro4j Workflow is a light-weight workflow engine for Java with Eclipse-based development environment. Workflow allows to build reusable…☆60Updated 5 years ago
- Distributed processing framework for search solutions☆81Updated last year
- ☆82Updated 3 months ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆207Updated last month
- Java library for generation and validation of software licenses (forked from OddSource/java-license-manager).☆12Updated 9 months ago
- Powerful, hierachical based desktop search engine based on swing and lucene.☆18Updated 7 years ago
- A modern WebSite and Web services framework with built in Async events, HTTP server,client and WebSocket server,client, all powered by Ne…☆70Updated 9 months ago
- Office 365 client for Java☆48Updated last year
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆83Updated 3 months ago
- edit a docx using CKEditor via XHTML round trip (with some session state)☆47Updated 6 years ago
- The FSS(file storage service) APIs make storing the blob file easy and simple .☆40Updated last year
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 5 years ago
- Mirror from sourceForge project Expr4j☆9Updated 5 years ago
- A multi-source-wrangling-incremental-and-full-loading Data Importer for ElasticSearch/Algolia☆31Updated 6 years ago
- smartcrop implementation in Java☆21Updated 5 years ago