apache / tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
2,985Updated this week

Alternatives and similar repositories for tika

Users that are interested in tika are comparing it to the libraries listed below

Sorting: