ukwa / docker-pdf2htmlexLinks
Run pdf2htmlEX in a Docker container.
☆25Updated last year
Alternatives and similar repositories for docker-pdf2htmlex
Users that are interested in docker-pdf2htmlex are comparing it to the libraries listed below
Sorting:
- PDF to XML ALTO file converter☆254Updated last month
- Apache Tika Server as a Docker Image☆172Updated 3 years ago
- Wrapper for pdftohtml that tries to extract paragraph structure☆52Updated 6 years ago
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 10 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- PAGE XML format collection for document image page content and more☆68Updated 4 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- A Named-Entity Recogniser based on Grobid.☆54Updated 5 months ago
- An open-source CRF Reference String Parsing Package☆160Updated 5 years ago
- ☆32Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 9 months ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 8 years ago
- PDF parser and converter to HTML☆89Updated last year
- Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist☆37Updated 6 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- A python implementation of DEPTA☆83Updated 8 years ago
- RAIS: A IIIF-compliant, 100% open source image server for blazing-fast deep zooming☆80Updated 6 months ago
- Process, enhance and evaluate multiple OCR output.☆24Updated last year
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆54Updated 2 years ago
- Co-reference resolution for the English language.☆17Updated 10 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆249Updated 2 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- The CIS OCR PostCorrectionTool☆44Updated 2 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated 3 months ago
- An efficient data structure for fast string similarity searches☆22Updated 4 years ago
- OCR-D post-correction with encoder-attention-decoder LSTMs☆13Updated 6 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆216Updated 5 years ago
- OCR evaluation brought to you by University of Alicante☆66Updated 3 years ago