apache / tika-dockerLinks
Convenience Docker images for Apache Tika Server
β234Updated last month
Alternatives and similar repositories for tika-docker
Users that are interested in tika-docker are comparing it to the libraries listed below
Sorting:
- π PDF text extraction pipeline: self-hosted, local-first, Docker-basedβ328Updated 2 years ago
- Docker files for a dockerized unoserverβ76Updated 2 weeks ago
- β879Updated 2 months ago
- Running Docling as an API serviceβ1,177Updated last week
- Python bindings to PDFium, reasonably cross-platform.β719Updated last week
- β201Updated last week
- Official Dockerfile for Apache Solrβ30Updated 2 weeks ago
- π Process PDFs, Word documents and more with spaCyβ847Updated 10 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.β407Updated last year
- A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utiliziβ¦β125Updated 3 weeks ago
- Scrape documentation into Meilisearchβ337Updated 2 weeks ago
- Self-hosted web UI for Qdrantβ363Updated this week
- OCRmyPDF EasyOCR pluginβ96Updated 4 months ago
- Tika-Python is a Python binding to the Apache Tikaβ’ REST services allowing Tika to be called natively in the Python community.β1,641Updated 9 months ago
- A simple Next.js frontend to explore your local weaviate collections and dataβ40Updated 7 months ago
- Source for the official Caddy v2 Docker Imageβ537Updated 2 weeks ago
- Docling core data types and transformationsβ225Updated last week
- Weaviate Web UIβ80Updated 2 years ago
- Simple package to extract text with coordinates from programmatic PDFsβ236Updated last week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, proviβ¦β41Updated 10 months ago
- Convert file formats like docx, xlx to other formats like pdf, png - based on jodconverter and libreofficeβ99Updated 4 months ago
- Milvus Command Lineβ116Updated 2 weeks ago
- β20Updated 11 months ago
- Docker Images for the Neo4j Graph Databaseβ373Updated this week
- PDF to XML ALTO file converterβ261Updated 2 weeks ago
- Extract structured text from pdfs quicklyβ656Updated 7 months ago
- Mattermost Agents plugin supporting multiple LLMsβ200Updated this week
- β106Updated this week
- A docker job scheduler (aka. crontab for docker)β313Updated 2 years ago
- Benchmarking PDF librariesβ321Updated 7 months ago