ukwa / docker-pdf2htmlexLinks

Run pdf2htmlEX in a Docker container.

☆25

Alternatives and similar repositories for docker-pdf2htmlex

Users that are interested in docker-pdf2htmlex are comparing it to the libraries listed below

Sorting:

kermitt2 / pdfalto
PDF to XML ALTO file converter
☆254Updated last month
LogicalSpark / docker-tikaserver
Apache Tika Server as a Docker Image
☆172Updated 3 years ago
mgedmin / pdf2html
Wrapper for pdftohtml that tries to extract paragraph structure
☆52Updated 6 years ago
KBNLresearch / keyword-generator
Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…
☆40Updated 8 years ago
BMKEG / lapdftextProject
High-level build project for all LAPDF-Text submodules
☆103Updated 10 years ago
fnl / segtok
Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…
☆170Updated 3 years ago
PRImA-Research-Lab / PAGE-XML
PAGE XML format collection for document image page content and more
☆68Updated 4 years ago
OpenPhilology / nidaba
An expandable and scalable OCR pipeline
☆87Updated 7 years ago
kermitt2 / grobid-ner
A Named-Entity Recogniser based on Grobid.
☆54Updated 5 months ago
knmnyn / ParsCit
An open-source CRF Reference String Parsing Package
☆160Updated 5 years ago
MedKhem / grobid-dictionaries
☆32Updated 2 years ago
proycon / flat
FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…
☆113Updated 9 months ago
nikolamilosevic86 / TableDisentangler
Functional and structural analysis of tables in research papers (Table disentangling)
☆20Updated 8 years ago
bitextor / pdf-extract
PDF parser and converter to HTML
☆89Updated last year
alexerdmann / HER
Humanities Entity Recognition: robust, practical, efficient Named Entity Recognition for today's digital humanist
☆37Updated 6 years ago
UB-Mannheim / ocr-gt-tools
Ergonomic line-by-line transcription of scanned text.
☆54Updated 4 years ago
pydepta / pydepta
A python implementation of DEPTA
☆83Updated 8 years ago
uoregon-libraries / rais-image-server
RAIS: A IIIF-compliant, 100% open source image server for blazing-fast deep zooming
☆80Updated 6 months ago
UB-Mannheim / ocromore
Process, enhance and evaluate multiple OCR output.
☆24Updated last year
natliblux / nautilusocr
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)
☆54Updated 2 years ago
opener-project / coreference-base
Co-reference resolution for the English language.
☆17Updated 10 years ago
webanno / webanno
🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…
☆249Updated 2 years ago
seomoz / simhash-db-py
Python API for Various DB-Backed Simhash Clusters
☆64Updated 8 years ago
cisocrgroup / PoCoTo
The CIS OCR PostCorrectionTool
☆44Updated 2 years ago
WZBSocialScienceCenter / pdf2xml-viewer
A simple viewer and inspection tool for text boxes in PDF documents
☆95Updated 3 years ago
OCR-D / ocrd_all
Master repository which includes most other OCR-D repositories as submodules
☆72Updated 3 months ago
poke1024 / simtrie
An efficient data structure for fast string similarity searches
☆22Updated 4 years ago
ASVLeipzig / cor-asv-ann
OCR-D post-correction with encoder-attention-decoder LSTMs
☆13Updated 6 months ago
dpapathanasiou / pdfminer-layout-scanner
A more complete example of programming with PDFMiner, which continues where the default documentation stops
☆216Updated 5 years ago
impactcentre / ocrevalUAtion
OCR evaluation brought to you by University of Alicante
☆66Updated 3 years ago