Norconex / importerLinks
Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
☆33Updated 3 months ago
Alternatives and similar repositories for importer
Users that are interested in importer are comparing it to the libraries listed below
Sorting:
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆196Updated this week
- Roostrap is a proven rapid application framework compilation built by putting together Spring Roo, Twitter Bootstrap and Google AppEngine…☆35Updated 10 years ago
- Apache POI builder☆54Updated 2 years ago
- Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV☆80Updated 2 years ago
- A Generic (n-ary) Tree implementation in Java☆104Updated 8 years ago
- Please use the luke bundled with lucene! This repo is archived and frozen now.☆101Updated 7 years ago
- Annotated Excel parsing library to simplify parsing excel sheet in JAVA☆89Updated last year
- Java library to use xml-rpc functionality of Wordpress☆78Updated 4 years ago
- Java EE Cache Filter☆36Updated 6 years ago
- Neuro4j Workflow is a light-weight workflow engine for Java with Eclipse-based development environment. Workflow allows to build reusable…☆63Updated 6 years ago
- Java 17 Library with tons of utility classes required in all projects☆36Updated 2 weeks ago
- Java2word is a Library to generate MS Word Documents from Java code without any special components.☆96Updated 3 years ago
- JODConverter automates document conversions using LibreOffice/OpenOffice.org☆35Updated 8 years ago
- SimMetrics is a Similarity Metric Library, based on previous work by http://sourceforge.net/projects/simmetrics/☆11Updated 9 years ago
- Java OCR allows you to perform OCR and bar code recognition on images (JPEG, PNG, TIFF, PDF, etc.) and output as plain text, xml with ful…☆136Updated 10 years ago
- BridJ bindings for Tesseract☆49Updated 9 years ago
- Office 365 client for Java☆51Updated 2 years ago
- Java library for generation and validation of software licenses (forked from OddSource/java-license-manager).☆12Updated last year
- A set of reusable Java components that implement functionality common to any web crawler☆246Updated last week
- Pivot4J provides a common API for OLAP servers which can be used to build an analytical service frontend with pivot style GUI.☆129Updated 3 years ago
- Web Accounting - Pippo Demo☆26Updated 7 years ago
- Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file b…☆211Updated last week
- YaHP is a Java library that allows you to convert an HTML document into a PDF document.☆55Updated 13 years ago
- FoGFaaS: Add serverless computing (faas) to ifogsim☆22Updated 7 months ago
- jMimeMagic is a Java library for determining the MIME type of files or streams.☆206Updated 3 years ago
- API to throttle/rate-limit requests☆56Updated 9 years ago
- EventBus system for publish and subscribe to events within an application☆33Updated 2 years ago
- Demo applications for Pippo (http://www.pippo.ro)☆26Updated 4 years ago
- Scriptella is an open source ETL (Extract-Transform-Load) and script execution tool written in Java. Note: The project is no longer under…☆107Updated 5 months ago
- Similarity or Distance Metrics, e.g. Levenshtein, for Java☆358Updated 4 years ago