The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,697Apr 13, 2026Updated last week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mirror of Apache PDFBox☆3,041Apr 15, 2026Updated last week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,651Mar 28, 2026Updated 3 weeks ago
- Apache OpenNLP☆1,593Apr 15, 2026Updated last week
- Apache NiFi☆6,053Apr 14, 2026Updated last week
- Apache Nutch is an extensible and scalable web crawler☆3,149Apr 14, 2026Updated last week
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Mirror of Apache POI gitbox. The Java API for Microsoft Documents.☆2,220Apr 7, 2026Updated 2 weeks ago
- 🔎 Open source distributed and RESTful search engine.☆12,769Updated this week
- Apache Lucene and Solr open-source search software☆4,363Sep 25, 2024Updated last year
- Apache Calcite☆5,108Apr 14, 2026Updated last week
- A high performance caching library for Java☆17,595Apr 15, 2026Updated last week
- Apache Pulsar - distributed pub-sub messaging system☆15,203Updated this week
- Free and Open Source, Distributed, RESTful Search Engine☆76,526Updated this week
- Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.☆12,634Updated this week
- A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.☆9,205Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Drools is a rule engine, DMN engine and complex event processing (CEP) engine for Java☆6,242Updated this week
- An annotation processor for generating type-safe bean mappers☆7,643Apr 12, 2026Updated last week
- Apache Commons Compress☆394Updated this week
- MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.☆60,711Feb 12, 2026Updated 2 months ago
- Apache Lucene open-source search software☆3,406Updated this week
- Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.☆20,709Updated this week
- GraalVM compiles applications into native executables that start instantly, scale fast, and use fewer compute resources 🚀☆21,555Updated this week
- Apache Kafka - A distributed event streaming platform☆32,379Apr 15, 2026Updated last week
- Go package for using Apache Tika☆251Apr 17, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or produ…☆6,184Updated this week
- LangChain4j is an idiomatic, open-source Java library for building LLM-powered applications on the JVM. It offers a unified API over popu…☆11,664Updated this week
- jOOQ is the best way to write SQL in Java☆6,700Updated this week
- Apache Flink☆25,943Updated this week
- Apache Druid: a high performance real-time analytics database.☆13,975Apr 15, 2026Updated last week
- Flyway by Redgate • Database Migrations Made Easy.☆9,694Apr 14, 2026Updated last week
- Redisson - Valkey & Redis Java client. Real-Time Data Platform. Sync/Async/RxJava/Reactive API. Over 50 Valkey and Redis based Java objec…☆24,309Updated this week
- Google core libraries for Java☆51,497Updated this week
- Convenience Docker images for Apache Tika Server☆239Apr 13, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code☆14,232Updated this week
- Tesseract Open Source OCR Engine (main repository)☆73,644Updated this week
- APM, Application Performance Monitoring System☆24,787Updated this week
- Quarkus: Supersonic Subatomic Java.☆15,622Updated this week
- Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on…☆6,584Updated this week
- Apache Solr open-source search software☆1,603Updated this week
- Code for Quartz Scheduler☆6,712Mar 20, 2026Updated last month