chrismattmann / tika-pythonLinks
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,640Updated 8 months ago
Alternatives and similar repositories for tika-python
Users that are interested in tika-python are comparing it to the libraries listed below
Sorting:
- extract text from any document. no muss. no fuss.☆4,403Updated last year
- Simple PDF text extraction☆969Updated 10 months ago
- 🪼 a python library for doing approximate and phonetic matching of strings.☆2,177Updated last week
- Community maintained fork of pdfminer - we fathom PDF☆6,837Updated 2 weeks ago
- Convert Word documents (.docx files) to HTML☆1,040Updated last month
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆457Updated 2 years ago
- NLP, before and after spaCy☆2,234Updated 2 years ago
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame