apache / tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
2,676Updated this week

Alternatives and similar repositories for tika:

Users that are interested in tika are comparing it to the libraries listed below