apache / tika-docker
Convenience Docker images for Apache Tika Server
☆163Updated last month
Alternatives and similar repositories for tika-docker:
Users that are interested in tika-docker are comparing it to the libraries listed below
- Apache Tika Server as a Docker Image☆171Updated 2 years ago
- ☆174Updated last week
- Official Dockerfile for Apache Solr☆27Updated last month
- PDF to XML ALTO file converter☆232Updated last week
- 📚 Process PDFs, Word documents and more with spaCy☆466Updated this week
- A spaCy wrapper for GliNER☆108Updated last month
- Improve your OpenSearch, Elasticsearch, Solr, Vectara, Algolia and Custom Search search quality.☆299Updated this week
- Demo of the neural semantic search built with Qdrant☆154Updated last month
- Python bindings to PDFium☆542Updated this week
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 7 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆127Updated last week
- spaCy REST API, wrapped in a Docker container.☆266Updated 2 years ago
- Towards an open source stack for e-commerce search☆147Updated this week
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆30Updated 4 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆259Updated this week
- ☆81Updated 9 months ago
- A Comprehensive Benchmark for Document Parsing and Evaluation☆277Updated 2 weeks ago
- Running Docling as an API service☆140Updated this week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆766Updated 3 months ago
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated 3 weeks ago
- Entity resolution for Elasticsearch.☆159Updated last month
- Lightweight, performant, deep table extraction☆429Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆312Updated last year
- User Behavior Insights plugin for OpenSearch☆23Updated this week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆807Updated this week
- Working with hOCR in Javascript☆126Updated 2 years ago
- A Redis server with additional database capabilities powered by Redis modules.☆195Updated last month
- Benchmarking PDF libraries☆263Updated last year
- UniTable: Towards a Unified Table Foundation Model☆440Updated 9 months ago
- GROBID extension for identifying and normalizing physical quantities.☆80Updated 5 months ago