apache / tika-docker
Convenience Docker images for Apache Tika Server
☆181Updated 2 weeks ago
Alternatives and similar repositories for tika-docker:
Users that are interested in tika-docker are comparing it to the libraries listed below
- Apache Tika Server as a Docker Image☆172Updated 2 years ago
- Running Docling as an API service☆315Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆109Updated 2 weeks ago
- ☆177Updated last week
- A python library to define and validate data types in Docling.☆122Updated this week
- 📚 Process PDFs, Word documents and more with spaCy☆559Updated last month
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 8 months ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆314Updated last year
- Pipeline for converting PDFs to raw text with PaddleOCR☆23Updated last year
- Generate BM25 sparse vector inside PostgreSQL☆65Updated 5 months ago
- SmolDocling OCR App built using SmolDocling 256M Model and Streamlit.☆123Updated last month
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆30Updated 2 months ago
- A spaCy wrapper for GliNER☆112Updated 2 months ago
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆192Updated 3 weeks ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆293Updated last month
- Python bindings to PDFium☆562Updated this week
- Open-source observability for your LLM application.☆51Updated 3 months ago
- Self-hosted web UI for Qdrant☆269Updated 2 weeks ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆509Updated last month
- Lightweight, performant, deep table extraction☆456Updated 3 weeks ago
- A lightweight version of Milvus☆318Updated this week
- A powerful command-line OCR tool built with Apple's Vision framework, supporting single image and batch processing with detailed position…☆105Updated 2 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆56Updated 7 months ago
- Modular, open source LLMOps stack that separates concerns: LiteLLM unifies LLM APIs, manages routing and cost controls, and ensures high-…☆93Updated 2 months ago
- DuckDB NSQL Model☆290Updated 6 months ago
- Scrape documentation into Meilisearch☆316Updated 3 months ago
- ☆13Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆214Updated 11 months ago
- 🦦 weasel: A small and easy workflow system☆83Updated 10 months ago
- ☆108Updated this week