apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,480Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Apache OpenNLP☆1,575Updated this week
- Apache Solr open-source search software☆1,539Updated this week
- Mirror of Apache PDFBox☆2,989Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,103Updated 2 weeks ago
- Apache Lucene open-source search software☆3,288Updated last week
- Apache Lucene and Solr open-source search software☆4,371Updated last year
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,169Updated last week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,560Updated 4 months ago
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,314Updated last week
- This is mavenised Luke: Lucene Toolbox Project☆1,552Updated 5 years ago
- Official Elasticsearch Java Client☆513Updated last week
- Elasticsearch File System Crawler (FS Crawler)☆1,420Updated last week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,193Updated last week
- Apache Freemarker☆1,072Updated last month
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,198Updated last week
- Java JNA wrapper for Tesseract OCR API☆1,722Updated this week
- Ehcache 3.x line☆2,076Updated last month
- A scalable, mature and versatile web crawler based on Apache Storm☆953Updated this week
- Language Detection Library for Java☆585Updated 3 years ago
- documents4j is a Java library for converting documents into another document format☆586Updated 10 months ago
- Mirror of Apache Mahout☆2,192Updated last week
- MinIO Client SDK for Java☆1,267Updated 2 weeks ago
- High performance non-blocking webserver☆3,737Updated this week
- OpenPDF is an open-source Java library for creating, editing, rendering, and encrypting PDF documents, as well as generating PDFs from HT…☆4,130Updated 2 months ago
- Apache ActiveMQ☆2,409Updated last week
- Takes third-party HTML and produces HTML that is safe to embed in your web application. Fast and easy to configure.☆917Updated last week
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,174Updated 3 months ago
- Aspose.Words for Java examples, plugins and showcases☆429Updated 3 weeks ago
- Java API for GeoIP2 webservice client and database reader☆851Updated this week
- Mirror of Apache HttpClient☆1,522Updated this week