apache / tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆2,676Updated this week
Alternatives and similar repositories for tika:
Users that are interested in tika are comparing it to the libraries listed below
- Mirror of Apache PDFBox☆2,729Updated this week
- Mirror of Apache POI☆1,980Updated this week
- Apache Solr open-source search software☆1,285Updated this week
- Apache ActiveMQ Classic☆2,331Updated last week
- Apache Lucene open-source search software☆2,805Updated this week
- Apache Lucene and Solr open-source search software☆4,371Updated 3 months ago
- Apache Freemarker☆991Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,143Updated last month
- Java JNA wrapper for Tesseract OCR API☆1,635Updated last month
- Elasticsearch File System Crawler (FS Crawler)☆1,368Updated this week
- Official Elasticsearch Java Client☆431Updated this week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆5,928Updated this week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,054Updated this week
- This is mavenised Luke: Lucene Toolbox Project☆1,542Updated 4 years ago
- Mirror of Apache HttpClient☆1,475Updated last week
- Apache NiFi☆5,027Updated this week
- Apache Curator☆3,124Updated this week
- 🔎 Open source distributed and RESTful search engine.☆10,075Updated this week
- Ehcache 3.x line☆2,034Updated this week
- C7 CE enters EOL in October 2025. Please check out C8 https://github.com/camunda/camunda – Flexible framework for workflow and decision a…☆4,154Updated this week
- Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more☆3,885Updated this week
- Apache Avro is a data serialization system.☆2,979Updated this week
- documents4j is a Java library for converting documents into another document format☆563Updated 5 months ago
- A web front end for an elastic search cluster☆9,432Updated 3 years ago
- RabbitMQ Java client☆1,256Updated this week
- VisualVM is an All-in-One Java Troubleshooting Tool☆2,924Updated 2 weeks ago
- Extract tables from PDF files☆1,871Updated last month
- Elasticsearch Java Rest Client.☆2,114Updated last year
- Provide support to increase developer productivity in Java when using Elasticsearch. Uses familiar Spring concepts such as a template cla…☆2,923Updated this week
- Emails at the heart of your business logic!☆911Updated this week