apache / tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆2,757Updated this week
Alternatives and similar repositories for tika:
Users that are interested in tika are comparing it to the libraries listed below
- Mirror of Apache PDFBox☆2,750Updated this week
- Apache Lucene open-source search software☆2,862Updated this week
- Apache Solr open-source search software☆1,308Updated this week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆5,949Updated this week
- Apache OpenNLP☆1,471Updated last week
- Apache Nutch is an extensible and scalable web crawler☆2,971Updated last month
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,246Updated this week
- MinIO Client SDK for Java☆1,148Updated last week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,444Updated 5 months ago
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,060Updated this week
- Apache Lucene and Solr open-source search software☆4,367Updated 4 months ago
- Core part of Jackson that defines Streaming API as well as basic shared abstractions☆2,286Updated last week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,547Updated 4 months ago
- Mirror of Apache POI☆1,992Updated this week
- A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache Pinot, Tablesaw, and many others☆3,616Updated 3 weeks ago
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ…☆5,721Updated this week
- Ehcache 3.x line☆2,034Updated last month
- Apache Druid: a high performance real-time analytics database.☆13,613Updated this week
- Azkaban workflow manager.☆4,489Updated 7 months ago
- Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.☆11,082Updated this week
- Java JNA wrapper for Tesseract OCR API☆1,641Updated this week
- VisualVM is an All-in-One Java Troubleshooting Tool☆2,948Updated this week
- Apache Kylin☆3,678Updated this week
- Provide support to increase developer productivity in Java when using Elasticsearch. Uses familiar Spring concepts such as a template cla…☆2,932Updated this week
- Apache Pinot - A realtime distributed OLAP datastore☆5,644Updated this week
- Official Elasticsearch Java Client☆438Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,161Updated last week
- A scalable, mature and versatile web crawler based on Apache Storm☆901Updated this week
- pdfHTML is an iText add-on for Java that allows you to easily convert HTML and CSS into standards compliant PDFs that are accessible, sea…☆240Updated this week
- Process Orchestration Framework☆3,473Updated this week