apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,138Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Apache Solr open-source search software☆1,445Updated this week
- Mirror of Apache PDFBox☆2,886Updated last week
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,080Updated this week
- Apache Lucene open-source search software☆3,097Updated this week
- Apache NiFi☆5,585Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,055Updated 3 weeks ago
- Apache Lucene and Solr open-source search software☆4,376Updated 10 months ago
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,510Updated last week
- Extract tables from PDF files☆1,957Updated 5 months ago
- Apache Freemarker☆1,048Updated last month
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,082Updated last week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,093Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,252Updated 3 weeks ago
- Official Elasticsearch Java Client☆489Updated this week
- Elasticsearch File System Crawler (FS Crawler)☆1,404Updated this week
- Convenience Docker images for Apache Tika Server☆205Updated last month
- Apache HBase☆5,373Updated last week
- H2 is an embeddable RDBMS written in Java.☆4,420Updated last week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,145Updated this week
- Ehcache 3.x line☆2,062Updated this week
- Code for Quartz Scheduler☆6,586Updated this week
- documents4j is a Java library for converting documents into another document format☆578Updated 6 months ago
- MinIO Client SDK for Java☆1,218Updated 3 weeks ago
- Mirror of Apache HttpClient☆1,500Updated this week
- C7 CE enters EOL in October 2025. Please check out C8 https://github.com/camunda/camunda – Flexible framework for workflow and decision a…☆4,235Updated this week
- [DEPRECATED] Core Java Library + PDF/A, xtra and XML Worker. Only security fixes will be added — please use iText 7☆1,662Updated this week
- MVEL (MVFLEX Expression Language)☆1,160Updated 5 months ago
- JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the …☆5,785Updated last week
- Apache Hive☆5,762Updated this week
- Process Orchestration Framework☆3,770Updated this week