apache / tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆2,525Updated this week
Related projects ⓘ
Alternatives and complementary repositories for tika
- Mirror of Apache PDFBox☆2,676Updated this week
- Apache Lucene open-source search software☆2,697Updated this week
- iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with …☆2,012Updated this week
- Apache Nutch is an extensible and scalable web crawler☆2,923Updated 3 weeks ago
- Apache Solr open-source search software☆1,239Updated this week
- Mirror of Apache POI☆1,957Updated this week
- JODConverter automates document conversions using LibreOffice or Apache OpenOffice.☆1,408Updated 2 months ago
- Apache Lucene and Solr open-source search software☆4,376Updated last month
- Extract tables from PDF files☆1,846Updated 2 weeks ago
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,158Updated this week
- H2 is an embeddable RDBMS written in Java.☆4,215Updated this week
- Apache OpenNLP☆1,447Updated this week
- Java JNA wrapper for Tesseract OCR API☆1,612Updated this week
- VisualVM is an All-in-One Java Troubleshooting Tool☆2,887Updated this week
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log-l…☆2,537Updated last month
- JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files☆2,119Updated last month
- Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.☆5,409Updated this week
- Apache Hive☆5,555Updated this week
- Emails at the heart of your business logic!☆893Updated this week
- The reliable, generic, fast and flexible logging framework for Java.☆3,016Updated 2 weeks ago
- JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the …☆5,409Updated this week
- Apache Storm☆6,603Updated this week
- Apache HBase☆5,230Updated this week
- Azkaban workflow manager.☆4,468Updated 4 months ago
- An application observability facade for the most popular observability tools. Think SLF4J, but for observability.☆4,483Updated this week
- Runtime code generation for the Java virtual machine.☆6,289Updated last week
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ …☆5,570Updated this week
- Apache NiFi☆4,903Updated this week
- documents4j is a Java library for converting documents into another document format☆556Updated 3 months ago
- This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)☆733Updated 5 years ago