apache / tikaLinks
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,357Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆2,923Updated this week
- Apache Solr open-source search software☆1,488Updated this week
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,121Updated last week
- Apache Lucene open-source search software☆3,178Updated this week
- Elasticsearch File System Crawler (FS Crawler)☆1,411Updated this week
- Apache Lucene and Solr open-source search software☆4,373Updated last year
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,531Updated last month
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,170Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,272Updated this week
- Emails at the heart of your business logic!☆960Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,077Updated this week
- VisualVM is an All-in-One Java Troubleshooting Tool☆3,119Updated 3 weeks ago
- This is mavenised Luke: Lucene Toolbox Project☆1,550Updated 5 years ago
- Apache Freemarker☆1,058Updated 3 months ago
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,555Updated last year
- Apache Geode☆2,316Updated last week
- Ehcache 3.x line☆2,069Updated this week
- Apache ActiveMQ Classic☆2,402Updated this week
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ…☆5,984Updated this week
- Official Elasticsearch Java Client☆495Updated this week
- JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the …☆5,828Updated this week
- [DEPRECATED] Core Java Library + PDF/A, xtra and XML Worker. Only security fixes will be added — please use iText 7☆1,665Updated last month
- Apache Ignite☆4,987Updated this week
- Apache Hive☆5,816Updated this week
- H2 is an embeddable RDBMS written in Java.☆4,471Updated 2 weeks ago
- XML/XHTML and CSS 2.1 renderer in pure Java☆2,153Updated this week
- An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF sup…☆2,074Updated last year
- Apache NiFi☆5,737Updated this week
- Apache Shiro☆4,404Updated this week
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,406Updated this week