apache / tika-dockerLinks
Convenience Docker images for Apache Tika Server
β191Updated last week
Alternatives and similar repositories for tika-docker
Users that are interested in tika-docker are comparing it to the libraries listed below
Sorting:
- Running Docling as an API serviceβ479Updated this week
- π Process PDFs, Word documents and more with spaCyβ644Updated 3 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)β1,220Updated last week
- β186Updated last week
- A Redis server with additional database capabilities powered by Redis modules.β209Updated 3 weeks ago
- β750Updated 2 months ago
- Official Dockerfile for Apache Solrβ28Updated 3 months ago
- A python library to define and validate data types in Docling.β148Updated this week
- β93Updated this week
- Docker image with jodconverter + libreoffice for document conversion through a REST apiβ91Updated last year
- A Model Context Protocol (MCP) server for interacting with Meilisearch through LLM interfaces.β100Updated this week
- 𦦠weasel: A small and easy workflow systemβ84Updated 11 months ago
- β19Updated 4 months ago
- Optional Rust Extensions to Speed Up the Python Driverβ35Updated last week
- β‘οΈ 80x faster Fasttext language detection out of the box | Split text by languageβ208Updated 2 months ago
- PDF to XML ALTO file converterβ244Updated 2 weeks ago
- Parse PDFs into markdown using Vision LLMsβ392Updated 4 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.β394Updated 10 months ago
- A spaCy wrapper for GliNERβ116Updated 4 months ago
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) intβ¦β622Updated 3 months ago
- Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.β381Updated 2 weeks ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseractβ31Updated 8 months ago
- Python bindings to PDFiumβ586Updated this week
- Docker files for a dockerized unoserverβ61Updated last week
- Alpine Linux based Elasticsearch Docker Imageβ191Updated last year
- π PDF text extraction pipeline: self-hosted, local-first, Docker-basedβ321Updated last year
- β125Updated this week
- A proxy server for multiple ollama instances with Key securityβ449Updated this week
- Docker Images for the Neo4j Graph Databaseβ354Updated last week
- Towards an open source stack for e-commerce searchβ149Updated 3 months ago