apache / tika-dockerLinks
Convenience Docker images for Apache Tika Server
β216Updated last month
Alternatives and similar repositories for tika-docker
Users that are interested in tika-docker are comparing it to the libraries listed below
Sorting:
- Apache Tika Server as a Docker Imageβ172Updated 3 years ago
- π PDF text extraction pipeline: self-hosted, local-first, Docker-basedβ329Updated 2 years ago
- Running Docling as an API serviceβ871Updated last week
- Docker files for a dockerized unoserverβ74Updated last month
- β197Updated this week
- β825Updated last month
- π Process PDFs, Word documents and more with spaCyβ784Updated 7 months ago
- Simple package to extract text with coordinates from programmatic PDFsβ209Updated last week
- Docker Images for the Neo4j Graph Databaseβ363Updated last week
- A python library to define and validate data types in Docling.β198Updated this week
- Official Dockerfile for Apache Solrβ30Updated last month
- β158Updated last week
- β99Updated this week
- Self-hosted web UI for Qdrantβ340Updated this week
- Python bindings to PDFium, reasonably cross-platform.β659Updated this week
- Elasticsearch File System Crawler (FS Crawler)β1,414Updated this week
- OCRmyPDF EasyOCR pluginβ93Updated last month
- Weaviate Web UIβ76Updated 2 years ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servicβ¦β721Updated last week
- Extract structured text from pdfs quicklyβ614Updated 4 months ago
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.β653Updated last month
- β826Updated last week
- β182Updated last week
- PyMuPDF4LLMβ1,089Updated last month
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Texβ¦β1,090Updated 6 months ago
- A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utiliziβ¦β123Updated this week
- 𦦠weasel: A small and easy workflow systemβ87Updated last year
- Dockerfile to run unoconv as a webserviceβ96Updated last month
- PDF to XML ALTO file converterβ254Updated last month
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical β¦β617Updated last month