The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,595Mar 6, 2026Updated last week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆3,025Mar 6, 2026Updated last week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,649Mar 5, 2026Updated last week
- Apache NiFi☆5,998Updated this week
- Apache OpenNLP☆1,587Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,139Feb 27, 2026Updated 2 weeks ago
- 🔎 Open source distributed and RESTful search engine.☆12,505Updated this week
- Apache Pulsar - distributed pub-sub messaging system☆15,157Updated this week
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,200Mar 5, 2026Updated last week
- Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.☆12,492Updated this week
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ…☆6,142Mar 5, 2026Updated last week
- A high performance caching library for Java☆17,531Mar 2, 2026Updated last week
- Apache Calcite☆5,081Mar 5, 2026Updated last week
- Apache Lucene and Solr open-source search software☆4,369Sep 25, 2024Updated last year
- A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.☆9,118Updated this week
- Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.☆20,705Updated this week
- GraalVM compiles applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀☆21,503Updated this week
- MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.☆60,462Feb 12, 2026Updated last month
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java☆6,221Updated this week
- Free and Open Source, Distributed, RESTful Search Engine☆76,280Updated this week
- Apache Druid: a high performance real-time analytics database.☆13,959Updated this week
- Apache Flink☆25,853Updated this week
- An annotation processor for generating type-safe bean mappers☆7,624Mar 4, 2026Updated last week
- Mirror of Apache Kafka☆32,207Updated this week
- Apache Lucene open-source search software☆3,361Updated this week
- Quarkus: Supersonic Subatomic Java.☆15,515Updated this week
- Flyway by Redgate • Database Migrations Made Easy.☆9,577Feb 27, 2026Updated last week
- LangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providin…☆11,000Updated this week
- Google core libraries for Java☆51,501Updated this week
- APM, Application Performance Monitoring System☆24,738Updated this week
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,589Updated this week
- Redisson - Valkey & Redis Java client. Real-Time Data Platform. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objec…☆24,263Mar 6, 2026Updated last week
- Apache Ignite☆5,053Updated this week
- Apache Superset is a Data Visualization and Data Exploration Platform☆70,860Updated this week
- jOOQ is the best way to write SQL in Java☆6,677Updated this week
- Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code☆14,177Updated this week
- The Cloud-Native API Gateway and AI Gateway☆16,269Mar 6, 2026Updated last week
- Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search☆43,188Updated this week
- Vert.x is a tool-kit for building reactive applications on the JVM☆14,659Updated this week
- Apache Doris is an easy-to-use, high performance and unified analytics database.☆15,088Updated this week