apache / tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆2,925Updated this week
Alternatives and similar repositories for tika:
Users that are interested in tika are comparing it to the libraries listed below
- Mirror of Apache PDFBox☆2,803Updated this week
- Apache Solr open-source search software☆1,364Updated this week
- Mirror of Apache POI☆2,012Updated this week
- Apache Lucene and Solr open-source search software☆4,376Updated 7 months ago
- Apache Freemarker☆1,022Updated last month
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java.☆6,002Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,005Updated 3 weeks ago
- VisualVM is an All-in-One Java Troubleshooting Tool☆2,994Updated last week
- Logstash - transport and process your logs, events, or other data☆14,444Updated this week
- Flyway by Redgate • Database Migrations Made Easy.☆8,707Updated this week
- Apache Lucene open-source search software☆2,928Updated this week
- Apache Storm☆6,617Updated this week
- Apereo CAS - Identity & Single Sign On for all earthlings and beyond.☆11,092Updated this week
- JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the …☆5,661Updated last week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,469Updated 7 months ago
- Apache NiFi☆5,277Updated this week
- Apache ZooKeeper☆12,458Updated this week
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,199Updated last week
- Activiti is a light-weight workflow and Business Process Management (BPM) Platform targeted at business people, developers and system adm…☆10,300Updated this week
- High performance non-blocking webserver☆3,636Updated last month
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,302Updated this week
- documents4j is a Java library for converting documents into another document format☆574Updated 2 months ago
- Apache OpenNLP☆1,507Updated this week
- Examples and server integrations for generating the Swagger API Specification, which enables easy access to your REST API☆7,423Updated 2 weeks ago
- Logback JSON encoder and appenders☆2,467Updated last week
- Apache Calcite☆4,798Updated this week
- H2 is an embeddable RDBMS written in Java.☆4,339Updated this week
- Apache Ignite☆4,919Updated this week
- Apache HBase☆5,328Updated this week
- TwelveMonkeys ImageIO: Additional plug-ins and extensions for Java's ImageIO☆1,988Updated last week