The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
☆3,572Feb 20, 2026Updated last week
Alternatives and similar repositories for tika
Users that are interested in tika are comparing it to the libraries listed below
Sorting:
- Mirror of Apache PDFBox☆3,021Feb 19, 2026Updated last week
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.