apache / tika-dockerLinks
Convenience Docker images for Apache Tika Server
☆220Updated 2 months ago
Alternatives and similar repositories for tika-docker
Users that are interested in tika-docker are comparing it to the libraries listed below
Sorting:
- Docker files for a dockerized unoserver☆75Updated this week
- Official Dockerfile for Apache Solr☆31Updated 2 weeks ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆329Updated 2 years ago
- A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizi…☆124Updated last week
- ☆833Updated 2 weeks ago
- Python bindings to PDFium, reasonably cross-platform.☆675Updated this week
- Extract structured text from pdfs quickly☆624Updated 5 months ago
- Philter redacts sensitive information such as PII and PHI in text.☆25Updated last week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆403Updated last year
- Entity resolution for Elasticsearch.☆163Updated last month
- ☆197Updated last week
- Docker Images for the Neo4j Graph Database☆368Updated 2 weeks ago
- Simplify DOCX files to JSON☆256Updated last year
- OCRmyPDF EasyOCR plugin☆93Updated 2 months ago
- A lightweight version of Milvus☆390Updated last month
- Apache Tika Server with Tesseract 4 Docker Setup☆23Updated 4 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆275Updated 3 years ago
- Mattermost Agents plugin supporting multiple LLMs☆190Updated this week
- Convert file formats like docx, xlx to other formats like pdf, png - based on jodconverter and libreoffice☆96Updated 2 months ago
- ☆20Updated 9 months ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Updated last year
- Source for the official Caddy v2 Docker Image☆521Updated 3 weeks ago
- Towards an open source stack for e-commerce search☆150Updated last month
- PDF to XML ALTO file converter☆257Updated 2 weeks ago
- SALI LMSS: Legal Matter Standard Specification☆68Updated 7 months ago
- INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.☆667Updated last week
- A simple Next.js frontend to explore your local weaviate collections and data☆38Updated 5 months ago
- Dockerfile to run unoconv as a webservice☆96Updated last month
- Generate BM25 sparse vector inside PostgreSQL☆87Updated last year
- A curated list of awesome things related to Gotenberg.☆185Updated last month