The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,782May 28, 2026Updated this week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mirror of Apache PDFBox☆3,067Updated this week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,653Updated this week
- Apache OpenNLP☆1,596May 25, 2026Updated last week
- Apache NiFi☆6,097Updated this week
- Apache Nutch is an extensible and scalable web crawler☆3,155May 25, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,231Updated this week
- 🔎 Open source distributed and RESTful search engine.☆13,018Updated this week
- Apache Lucene and Solr open-source search software☆4,361May 15, 2026Updated 2 weeks ago
- Apache Calcite☆5,130Updated this week
- A high performance caching library for Java☆17,677Updated this week
- Apache Pulsar - distributed pub-sub messaging system☆15,253May 25, 2026Updated last week
- Free and Open Source, Distributed, RESTful Search Engine☆76,759Updated this week
- A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.☆9,285Updated this week
- Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.☆12,747Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java☆6,262May 22, 2026Updated last week
- An annotation processor for generating type-safe bean mappers☆7,660May 24, 2026Updated last week
- Apache Commons Compress☆398Updated this week
- MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.☆61,028Apr 24, 2026Updated last month
- Apache Lucene open-source search software☆3,436May 25, 2026Updated last week
- Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.☆20,726Updated this week
- Apache Kafka - A distributed event streaming platform☆32,662Updated this week
- GraalVM compiles applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀☆21,582May 24, 2026Updated last week
- LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popu…☆12,075May 23, 2026Updated last week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Go package for using Apache Tika☆252Apr 17, 2025Updated last year
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ…☆6,214Updated this week
- jOOQ is the best way to write SQL in Java☆6,719May 21, 2026Updated last week
- Apache Druid: a high performance real-time analytics database.☆14,010Updated this week
- Apache Flink☆26,032Updated this week
- Flyway by Redgate • Database Migrations Made Easy.☆9,811Updated this week
- Redisson: the high-level Java client for Redis and Valkey. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objects an…☆24,345Updated this week
- Google core libraries for Java☆51,487May 23, 2026Updated last week
- Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code☆14,281May 24, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Tesseract Open Source OCR Engine (main repository)☆74,295Apr 27, 2026Updated last month
- APM, Application Performance Monitoring System☆24,802May 24, 2026Updated last week
- Quarkus: Supersonic Subatomic Java.☆15,685May 25, 2026Updated last week
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,569May 25, 2026Updated last week
- Apache Solr open-source search software☆1,617May 23, 2026Updated last week
- Code for Quartz Scheduler☆6,724May 13, 2026Updated 2 weeks ago
- The Cloud-Native API Gateway and AI Gateway☆16,645Updated this week