apache / tika-dockerLinks
Convenience Docker images for Apache Tika Server
☆227Updated 3 months ago
Alternatives and similar repositories for tika-docker
Users that are interested in tika-docker are comparing it to the libraries listed below
Sorting:
- Docker files for a dockerized unoserver☆75Updated 2 weeks ago
- ☆862Updated last month
- Running Docling as an API service☆1,080Updated 2 weeks ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆329Updated 2 years ago
- ☆199Updated this week
- Official Dockerfile for Apache Solr☆30Updated last month
- Python bindings to PDFium, reasonably cross-platform.☆699Updated last week
- Dockerfile to run unoconv as a webservice☆96Updated 3 months ago
- Self-hosted web UI for Qdrant☆354Updated last week
- Convert file formats like docx, xlx to other formats like pdf, png - based on jodconverter and libreoffice☆96Updated 3 months ago
- ☆186Updated last week
- 📚 Process PDFs, Word documents and more with spaCy☆832Updated 9 months ago
- A Redis server with additional database capabilities powered by Redis modules.☆227Updated 2 months ago
- Weaviate Web UI☆81Updated 2 years ago
- Simplify DOCX files to JSON☆257Updated last year
- Fast integer versions of trained LSTM models☆586Updated last year
- A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizi…☆125Updated 3 weeks ago
- Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.☆335Updated last year
- A lightweight version of Milvus☆416Updated last month
- Official Elastic connectors for third-party data sources☆123Updated last week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆405Updated last year
- Free and Open Source Plugin that adds enterprise features to Neo4j Community Distributions☆129Updated 6 months ago
- Docker Images for the Neo4j Graph Database☆370Updated 2 weeks ago
- The open source PII and PHI redaction and de-identification engine☆79Updated this week
- ☆857Updated this week
- A python library to define and validate data types in Docling.☆217Updated 2 weeks ago
- 🦦 weasel: A small and easy workflow system☆88Updated last month
- Benchmarking PDF libraries☆316Updated 6 months ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 4 years ago
- OCRmyPDF EasyOCR plugin☆97Updated 3 months ago