kermitt2/pdfalto

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kermitt2/pdfalto)

kermitt2 / pdfalto

PDF to XML ALTO file converter

☆272

Alternatives and similar repositories for pdfalto

Users that are interested in pdfalto are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kermitt2 / pdf2xml
View on GitHub
pdf2xml convertor based on Xpdf library - modified version
☆27Feb 23, 2018Updated 8 years ago
kermitt2 / biblio-glutton-extension
View on GitHub
A browser extension providing Open Access bibliographical services
☆18Dec 9, 2022Updated 3 years ago
kermitt2 / biblio-glutton
View on GitHub
A high performance bibliographic information service: https://biblio-glutton.readthedocs.io
☆150Apr 8, 2026Updated 3 months ago
filak / hOCR-to-ALTO
View on GitHub
Convert between Tesseract hOCR and ALTO XML using XSL stylesheets
☆60Mar 20, 2026Updated 4 months ago
kermitt2 / biblio_glutton_harvester
View on GitHub
Open Access PDF harvester
☆42May 3, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
softcite / software-mentions
View on GitHub
Softcite software mention recognizer, finding mentions and citations to software from within the academic literature
☆85Jun 6, 2026Updated last month
grobidOrg / grobid
View on GitHub
A machine learning software for extracting information from scholarly documents
☆5,035Updated this week
kermitt2 / grobid-example
View on GitHub
Some examples of usage of Grobid in a third party java project.
☆20Jun 14, 2023Updated 3 years ago
kermitt2 / grisp
View on GitHub
Knowledge Base stuff
☆23Mar 1, 2026Updated 4 months ago
kermitt2 / entity-fishing
View on GitHub
A machine learning tool for fishing entities
☆268Feb 27, 2026Updated 5 months ago
cneud / ocr-conversion
View on GitHub
Conversions between various OCR formats
☆84Feb 13, 2026Updated 5 months ago
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Jul 25, 2026Updated last week
lfoppiano / grobid-quantities
View on GitHub
GROBID extension for identifying and normalizing physical quantities.
☆85Apr 8, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kermitt2 / delft
View on GitHub
a Deep Learning Framework for Text https://delft.readthedocs.io/
☆416Jul 21, 2026Updated last week
MedKhem / grobid-dictionaries
View on GitHub
☆33Nov 16, 2022Updated 3 years ago
kermitt2 / datastet
View on GitHub
Finding mentions and citations to named and implicit research datasets from within the academic literature
☆31Jun 14, 2025Updated last year
istex-archives / istex-browser-extension
View on GitHub
Bouton ISTEX : extension web capable d'insérer dynamiquement sur la page web consultée un lien vers le fulltext d'un document si ce dern…
☆11May 30, 2023Updated 3 years ago
grobidOrg / grobid-ner
View on GitHub
A Named-Entity Recogniser based on Grobid.
☆55May 14, 2025Updated last year
altoxml / schema
View on GitHub
ALTO XML schema - latest and all former versions
☆55Jul 8, 2026Updated 3 weeks ago
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
PRImA-Research-Lab / prima-page-to-pdf
View on GitHub
Java command line tool to convert PAGE XML files with layout and text content to PDF
☆10Apr 27, 2020Updated 6 years ago
ourresearch / paperbuzz-api
View on GitHub
wrapper for the crossref events api
☆24May 23, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
kermitt2 / article_dataset_builder
View on GitHub
Open Access PDF harvester, metadata aggregator and full-text ingester
☆62May 3, 2024Updated 2 years ago
cneud / alto-tools
View on GitHub
Python tools for performing various operations on ALTO XML files
☆50Jun 12, 2026Updated last month
BobLd / DocumentLayoutAnalysis
View on GitHub
Document Layout Analysis resources repos for development with PdfPig.
☆637Oct 1, 2023Updated 2 years ago
allenai / spv2
View on GitHub
Science-parse version 2
☆257Nov 20, 2019Updated 6 years ago
softcite / softcite_kb
View on GitHub
A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources
☆18May 14, 2023Updated 3 years ago
pd3f / dehyphen
View on GitHub
📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF
☆39Mar 8, 2022Updated 4 years ago
lfoppiano / grobid-superconductors
View on GitHub
Grobid module for superconductor material and properties extraction
☆23May 17, 2025Updated last year
UB-Mannheim / ocr-fileformat
View on GitHub
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
☆204May 21, 2025Updated last year
lfoppiano / material-parsers
View on GitHub
Material parsers and other tools, scripts Initially developed for Grobid Superconductor
☆14Feb 21, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CeON / CERMINE
View on GitHub
Content ExtRactor and MINEr
☆513Jun 30, 2022Updated 4 years ago
leoba / TEI-2-IIIF
View on GitHub
XSLT for converting TEI MsDescription to IIIF manifests
☆13Oct 18, 2016Updated 9 years ago
elacin / PDFExtract
View on GitHub
my take at a PDF text extraction utility
☆15Jun 15, 2015Updated 11 years ago
rochester-rcl / data-dictionary
View on GitHub
☆13Sep 4, 2015Updated 10 years ago
kermitt2 / xpdf-4.00
View on GitHub
☆19Apr 6, 2021Updated 5 years ago
ScienciaLAB / structure-vision
View on GitHub
Viewer for the structure extracted by Grobid on PDF documents
☆57Nov 7, 2025Updated 8 months ago
PierreSenellart / theoremkb
View on GitHub
Collection of tools to extract semantic information from (mathematical) research articles
☆24Jul 21, 2026Updated last week