apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,199Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆2,903Updated this week
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,106Updated last week
- Apache Solr open-source search software☆1,464Updated this week
- Apache Lucene open-source search software☆3,133Updated last week
- Java JNA wrapper for Tesseract OCR API☆1,701Updated this week
- Apache Lucene and Solr open-source search software☆4,375Updated 11 months ago
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,519Updated 3 weeks ago
- Apache OpenNLP☆1,536Updated last week
- Apache Nutch is an extensible and scalable web crawler☆3,066Updated last week
- Elasticsearch File System Crawler (FS Crawler)☆1,408Updated this week
- Apache NiFi☆5,657Updated this week
- Official Elasticsearch Java Client☆494Updated last week
- High performance non-blocking webserver☆3,693Updated this week
- documents4j is a Java library for converting documents into another document format☆580Updated 7 months ago
- Extract tables from PDF files☆1,966Updated 5 months ago
- Apache Freemarker☆1,052Updated 2 months ago
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,261Updated last week
- Apache Druid: a high performance real-time analytics database.☆13,821Updated this week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,153Updated last week
- The reliable, generic, fast and flexible logging framework for Java.☆3,152Updated last month
- Emails at the heart of your business logic!☆955Updated this week
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,100Updated 2 weeks ago
- A scalable, mature and versatile web crawler based on Apache Storm☆932Updated this week
- Ehcache 3.x line☆2,067Updated this week
- This repository is a fork of apache/incubator-kie-drools. Please use upstream repository for development.☆115Updated 2 weeks ago
- Aspose.Words for Java examples, plugins and showcases☆422Updated last week
- Apache Kylin☆3,748Updated this week
- Apache HBase☆5,383Updated last week
- Mirror of Apache HttpClient☆1,510Updated last week
- JTokkit is a Java tokenizer library designed for use with OpenAI models.☆688Updated last week