ukwa / docker-pdf2htmlexLinks
Run pdf2htmlEX in a Docker container.
☆25Updated last year
Alternatives and similar repositories for docker-pdf2htmlex
Users that are interested in docker-pdf2htmlex are comparing it to the libraries listed below
Sorting:
- Wrapper for pdftohtml that tries to extract paragraph structure☆50Updated 6 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- RAIS: A IIIF-compliant, 100% open source image server for blazing-fast deep zooming☆78Updated last month
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 7 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- Get semantic HTML from PDFs, recover lost text, tables, data... in bulk.☆31Updated 6 months ago
- A library for extracting tables from PDF files☆89Updated 4 years ago
- liberate all kinds of data from PDF and other unstructural format and make the information machine-readable and visualizeable for popul…☆31Updated 7 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- containerised brat (http://brat.nlplab.org/)☆51Updated last year
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆214Updated 5 years ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- PDF to XML ALTO file converter☆240Updated last week
- LanguageCrunch NLP server docker image☆286Updated 2 years ago
- BIBFRAME Datastore is a Linked-Data project for managing bibliographic records and operational data focused on libraries and other simila…☆16Updated 9 years ago
- Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.☆130Updated 7 years ago
- A suite of batches and tools for OCR tasks.☆71Updated 2 years ago
- Toolbox for OCR post-correction☆121Updated 5 years ago
- Wayward is a Python package that helps to identify characteristic terms from single documents or groups of documents. It can be used for …☆9Updated 5 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆245Updated 2 years ago
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81Updated 3 years ago
- Command line OAI-PMH harvester and client with built-in cache.☆125Updated this week
- The CIS OCR PostCorrectionTool☆42Updated 2 years ago
- Text pattern search using marisa-trie☆18Updated 4 months ago
- Experiments mining image collections using OpenCV☆64Updated 10 years ago
- go-corenlp is a Golang wrapper for Stanford CoreNLP.☆30Updated 5 years ago