apache / tikaView on GitHub
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
3,657Mar 26, 2026Updated last week

Alternatives and similar repositories for tika

Users that are interested in tika are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?