apache / tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆2,861Updated this week
Alternatives and similar repositories for tika:
Users that are interested in tika are comparing it to the libraries listed below
- Mirror of Apache PDFBox☆2,784Updated this week
- Apache Lucene and Solr open-source search software☆4,374Updated 6 months ago
- Mirror of Apache POI☆2,002Updated this week
- Apache NiFi☆5,201Updated this week
- Ehcache 3.x line☆2,037Updated 2 months ago
- Apache Solr open-source search software☆1,338Updated this week
- Azkaban workflow manager.☆4,490Updated 8 months ago
- Mirror of Apache HttpClient☆1,484Updated this week
- Apache Nutch is an extensible and scalable web crawler☆2,990Updated 2 months ago
- Mirror of Apache Mahout☆2,163Updated this week
- Apache Storm☆6,612Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,181Updated 2 weeks ago
- Java implementation of the Aho-Corasick algorithm for efficient string matching☆913Updated 11 months ago
- Java JNA wrapper for Tesseract OCR API☆1,659Updated last month
- Apache HBase☆5,312Updated this week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,562Updated this week
- Apache Drill is a distributed MPP query layer for self describing data☆1,962Updated 2 weeks ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,547Updated 5 months ago
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,077Updated this week
- Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.☆6,474Updated last week
- Apache Calcite☆4,766Updated this week
- A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.☆13,126Updated last week
- Apache Kylin☆3,685Updated last week
- Apache Lucene open-source search software☆2,901Updated this week
- Java Native Access☆8,661Updated last week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,464Updated 6 months ago
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,902Updated this week
- High performance non-blocking webserver☆3,630Updated last week
- documents4j is a Java library for converting documents into another document format☆569Updated last month
- Language Detection Library for Java☆575Updated 2 years ago