apache / tika-docker
Convenience Docker images for Apache Tika Server
☆175Updated last month
Alternatives and similar repositories for tika-docker:
Users that are interested in tika-docker are comparing it to the libraries listed below
- Apache Tika Server as a Docker Image☆172Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 7 months ago
- ☆176Updated 2 weeks ago
- 📚 Process PDFs, Word documents and more with spaCy☆500Updated 3 weeks ago
- A spaCy wrapper for GliNER☆108Updated 2 months ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆73Updated 3 years ago
- Docker files for a dockerized unoserver☆52Updated this week
- Pipeline for converting PDFs to raw text with PaddleOCR☆21Updated last year
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 7 months ago
- ☆23Updated this week
- A python library to define and validate data types in Docling.☆96Updated this week
- ☆90Updated 2 weeks ago
- Running Docling as an API service☆196Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆85Updated last week
- Examples for Docker-Solr☆62Updated 4 years ago
- PDF to XML ALTO file converter☆234Updated this week
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆217Updated this week
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,014Updated 2 years ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆49Updated 6 months ago
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆27Updated last month
- Annotate entities directly onto a PDF with automatic OCR for scanned PDFs☆59Updated last year
- ☆131Updated this week
- INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.☆620Updated this week
- Dockerfile to run unoconv as a webservice☆96Updated 2 years ago
- Python bindings to PDFium☆552Updated 2 weeks ago
- 🦦 weasel: A small and easy workflow system☆80Updated 9 months ago
- Improve your OpenSearch, Elasticsearch, Solr, Vectara, Algolia and Custom Search search quality.☆301Updated this week
- Software that makes labeling PDFs easy.☆409Updated 10 months ago
- Boilerplate Removal using Deep Learning☆82Updated 3 years ago
- JSON-NLP Schema for transfer of NLP output using JSON☆52Updated 4 years ago