apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,055Updated last week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆2,852Updated last week
- Apache Calcite☆4,867Updated this week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,047Updated this week
- Apache Lucene and Solr open-source search software☆4,374Updated 9 months ago
- Mirror of Apache POI☆2,034Updated this week
- Apache Lucene open-source search software☆3,030Updated this week
- Apache NiFi☆5,410Updated this week
- Ehcache 3.x line☆2,049Updated last month
- Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.☆11,539Updated this week
- Apache Solr open-source search software☆1,413Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,037Updated 3 months ago
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,117Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,954Updated this week
- Code for Quartz Scheduler☆6,527Updated 2 months ago
- Apache ZooKeeper☆12,519Updated this week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,489Updated last month
- Apache Druid: a high performance real-time analytics database.☆13,751Updated last week
- Azkaban workflow manager.☆4,501Updated 11 months ago
- High performance non-blocking webserver☆3,659Updated last month
- Apache HBase☆5,351Updated this week
- H2 is an embeddable RDBMS written in Java.☆4,378Updated 2 months ago
- Java JNA wrapper for Tesseract OCR API☆1,675Updated 2 weeks ago
- Apache ActiveMQ Classic☆2,366Updated last week
- Apache Freemarker☆1,035Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,221Updated last month
- Apache OpenNLP☆1,516Updated last week
- [DEPRECATED] Core Java Library + PDF/A, xtra and XML Worker. Only security fixes will be added — please use iText 7☆1,650Updated 8 months ago
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,048Updated last week
- documents4j is a Java library for converting documents into another document format☆577Updated 4 months ago
- Emails at the heart of your business logic!☆943Updated this week