apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,121Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆2,870Updated last week
- Apache Solr open-source search software☆1,431Updated this week
- Apache OpenNLP☆1,526Updated last week
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,061Updated this week
- Apache Lucene open-source search software☆3,075Updated this week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,133Updated this week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,498Updated last week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,239Updated this week
- Apache NiFi☆5,493Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,047Updated last week
- Apache Freemarker☆1,044Updated last month
- Java JNA wrapper for Tesseract OCR API☆1,682Updated last week
- Apache Lucene and Solr open-source search software☆4,374Updated 10 months ago
- OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of…☆3,932Updated this week
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,067Updated last week
- XDocReport means XML Document reporting. It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffi…☆1,279Updated last month
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,063Updated 2 weeks ago
- This is mavenised Luke: Lucene Toolbox Project☆1,546Updated 5 years ago
- H2 is an embeddable RDBMS written in Java.☆4,398Updated this week
- Ehcache 3.x line☆2,058Updated 2 months ago
- This repository is a fork of apache/incubator-kie-drools. Please use upstream repository for development.☆110Updated 2 weeks ago
- 🔎 Open source distributed and RESTful search engine.☆11,192Updated last week
- Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.☆3,013Updated last week
- Apache Calcite☆4,902Updated this week
- documents4j is a Java library for converting documents into another document format☆577Updated 5 months ago
- Official Elasticsearch Java Client☆486Updated this week
- Apache HBase☆5,366Updated this week
- [DEPRECATED] Core Java Library + PDF/A, xtra and XML Worker. Only security fixes will be added — please use iText 7☆1,659Updated 9 months ago
- Convenience Docker images for Apache Tika Server☆200Updated 3 weeks ago
- Mirror of Apache HttpClient☆1,495Updated this week