Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,651Mar 28, 2026Updated 3 weeks ago
Alternatives and similar repositories for tika-python
Users that are interested in tika-python are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Apr 9, 2025Updated last year
- Python bindings for Apache Tika☆24Aug 20, 2020Updated 5 years ago
- extract text from any document. no muss. no fuss.☆4,518Apr 3, 2026Updated 2 weeks ago
- The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).☆3,697Apr 13, 2026Updated last week
- Apache Tika Server as a Docker Image☆173Jul 17, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Python wrapper for Apache Tika, made to be easy_installed☆26Apr 17, 2012Updated 14 years ago
- Community maintained fork of pdfminer - we fathom PDF☆6,952Mar 13, 2026Updated last month
- 💫 Industrial-strength Natural Language Processing (NLP) in Python☆33,473Mar 28, 2026Updated 3 weeks ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,298Dec 7, 2022Updated 3 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,929Apr 14, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,370Oct 27, 2025Updated 5 months ago
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Apr 15, 2016Updated 10 years ago
- Fuzzy String Matching in Python☆9,259Feb 24, 2023Updated 3 years ago
- NLP, before and after spaCy☆2,239Sep 22, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆10,130Jan 28, 2026Updated 2 months ago
- Topic Modelling for Humans☆16,394Nov 1, 2025Updated 5 months ago
- A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.☆13Sep 8, 2017Updated 8 years ago
- A system for quickly generating training data with weak supervision☆5,953Apr 10, 2026Updated last week
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,314Dec 5, 2024Updated last year
- Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and a…☆24,907Updated this week
- Simple PDF text extraction☆1,015Feb 27, 2026Updated last month
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆9,485Updated this week
- Extract Keywords from sentence or Replace keywords in sentences.☆5,710Apr 13, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Convenience Docker images for Apache Tika Server☆239Apr 13, 2026Updated last week
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…☆18,703Apr 10, 2026Updated last week
- This is a REST Server endpoint built using Flask and Python.☆24Nov 16, 2022Updated 3 years ago
- For extracting measurements and related entities from text☆58May 6, 2020Updated 5 years ago
- Hadoop integration code for working with with Apache cTAKES☆10Feb 11, 2014Updated 12 years ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,406Apr 14, 2026Updated last week
- Camelot: PDF Table Extraction for Humans☆3,715Jan 5, 2023Updated 3 years ago
- A Python library to extract tabular data from PDFs☆3,674Updated this week
- A Unified Toolkit for Deep Learning Based Document Image Analysis☆5,714Aug 15, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Open source annotation tool for machine learning practitioners.☆10,626Apr 14, 2026Updated last week
- Module for automatic summarization of text documents and HTML pages.☆3,675Mar 31, 2026Updated 3 weeks ago
- Parallel computing with task scheduling☆13,804Apr 13, 2026Updated last week
- An open-source NLP research library, built on PyTorch.☆11,890Nov 22, 2022Updated 3 years ago
- A natural language modeling framework based on PyTorch☆6,304Oct 17, 2022Updated 3 years ago
- This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …☆18Jan 27, 2024Updated 2 years ago
- State of the Art Natural Language Processing☆4,127Updated this week