apache / tika-docker
Convenience Docker images for Apache Tika Server
☆155Updated 2 weeks ago
Alternatives and similar repositories for tika-docker:
Users that are interested in tika-docker are comparing it to the libraries listed below
- Official Dockerfile for Apache Solr☆26Updated last month
- A python library to define and validate data types in Docling.☆71Updated this week
- Running Docling as an API service☆98Updated this week
- ☆173Updated this week
- Python bindings to PDFium☆522Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆311Updated last year
- A spaCy wrapper for GliNER☆108Updated 3 weeks ago
- Generate BM25 sparse vector inside PostgreSQL☆62Updated 3 months ago
- A curated list of awesome things related to Gotenberg.☆136Updated 3 weeks ago
- PDF to XML ALTO file converter☆224Updated this week
- ☆56Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆68Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆102Updated this week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆31Updated 5 months ago
- A component orchestration engine☆28Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆244Updated this week
- 📚 Process PDFs, Word documents and more with spaCy☆412Updated last month
- ☆22Updated 8 months ago
- RAG Citation enhances Retrieval-Augmented Generation (RAG) by automatically generating relevant citations for AI-generated content. It en…☆24Updated 3 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆66Updated 6 months ago
- Demo of the neural semantic search built with Qdrant☆149Updated last week
- ☆79Updated 8 months ago
- ☆75Updated this week
- Towards an open source stack for e-commerce search☆147Updated this week
- Extract structured text from pdfs quickly☆418Updated this week
- AI Server☆76Updated this week
- ☆648Updated last week
- Docker files for a dockerized unoserver☆52Updated this week
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.☆146Updated last week
- Translate files using Argos Translate☆17Updated 3 months ago