kermitt2 / pdfaltoLinks
PDF to XML ALTO file converter
☆242Updated 2 weeks ago
Alternatives and similar repositories for pdfalto
Users that are interested in pdfalto are comparing it to the libraries listed below
Sorting:
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆189Updated last month
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆139Updated this week
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆68Updated 4 years ago
- High-level build project for all LAPDF-Text submodules☆103Updated 9 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆178Updated 2 years ago
- pdf2xml convertor based on Xpdf library - modified version☆27Updated 7 years ago
- Conversions between various OCR formats☆78Updated 2 years ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- Science-parse version 2☆244Updated 5 years ago
- Logical structure analysis for visually structured documents☆90Updated 2 years ago
- Software that makes labeling PDFs easy.☆415Updated last year
- A machine learning tool for fishing entities☆264Updated last month
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.☆187Updated 3 weeks ago
- Some examples of usage of Grobid in a third party java project.☆19Updated 2 years ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆393Updated 10 months ago
- Master repository which includes most other OCR-D repositories as submodules☆73Updated last month
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81Updated 3 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆53Updated 2 years ago
- Service for converting and enhancing heterogeneous publisher XML formats into TEI☆56Updated 9 months ago
- Python client for GROBID Web services☆339Updated last week
- A step-by-step C# implementation of the Docstrum algorithm☆23Updated 4 years ago
- Ergonomic line-by-line transcription of scanned text.☆52Updated 4 years ago
- A Named-Entity Recogniser based on Grobid.☆53Updated last month
- Collection of OCR-related python tools and wrappers from @OCR-D☆128Updated this week
- a Deep Learning Framework for Text https://delft.readthedocs.io/☆399Updated 2 weeks ago
- Working with hOCR in Javascript☆129Updated 2 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆245Updated 2 years ago
- GROBID extension for identifying and normalizing physical quantities.☆82Updated last week
- The hOCR Embedded OCR Workflow and Output Format☆73Updated 10 months ago