kermitt2/pdf2xml

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kermitt2/pdf2xml)

kermitt2 / pdf2xml

pdf2xml convertor based on Xpdf library - modified version

☆27

Alternatives and similar repositories for pdf2xml

Users that are interested in pdf2xml are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kermitt2 / pdfalto
View on GitHub
PDF to XML ALTO file converter
☆272Updated this week
kermitt2 / xpdf-4.00
View on GitHub
☆19Apr 6, 2021Updated 5 years ago
IEBH / SRA
View on GitHub
Project based at the Bond University Center for Research in Evidence-Based Practice (CREBP) with the aim of drastically reducing the time…
☆15Aug 28, 2017Updated 8 years ago
parisolab / k3dsurf
View on GitHub
Mathematical Software (Now MathMod)
☆11Apr 10, 2018Updated 8 years ago
kermitt2 / Pub2TEI
View on GitHub
Service for converting and enhancing heterogeneous publisher XML formats into TEI
☆65Apr 12, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
kermitt2 / grobid-example
View on GitHub
Some examples of usage of Grobid in a third party java project.
☆20Jun 14, 2023Updated 3 years ago
kermitt2 / grobid-astro
View on GitHub
A machine learning software for extracting astronomical entities from scholarly documents
☆10Oct 31, 2022Updated 3 years ago
bio-ontology-research-group / Onto2Graph
View on GitHub
Generating graph structures from OWL ontologies
☆12Nov 21, 2017Updated 8 years ago
helboukkouri / character-bert-pretraining
View on GitHub
Code for pre-training CharacterBERT models (as well as BERT models).
☆34Sep 6, 2021Updated 4 years ago
jinseikenai / uth-bert
View on GitHub
Pre-processing text and tokenization for UTH-BERT
☆10Sep 30, 2020Updated 5 years ago
andikarachman / RNN-Twitter-Sentiment-Analysis
View on GitHub
A recurrent neural network model to analyze how travelers expressed their feelings on Twitter
☆12Jun 30, 2019Updated 7 years ago
glennDittmann / geogram_predicates
View on GitHub
Geograms robust predicates in rust via cxx.
☆16May 11, 2025Updated last year
macmillancontentscience / morphemepiece
View on GitHub
☆11Apr 15, 2022Updated 4 years ago
rlcmtzc / SICSS-Python-Crash-Course
View on GitHub
The Python crash course of the Summer Institute in Computational Social Science 2022!
☆10Nov 19, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
oscar-project / oscar-website
View on GitHub
The website of the Oscar Project
☆11Mar 27, 2025Updated last year
pjox / gutf
View on GitHub
Terminal tool that converts files encoding to UTF-8
☆10Oct 5, 2019Updated 6 years ago
grobidOrg / grobid-client-python
View on GitHub
Python client for GROBID Web services
☆410Mar 5, 2026Updated 4 months ago
cgg-bern / hex-me-if-you-can
View on GitHub
☆15Jul 6, 2022Updated 4 years ago
jwilk-archive / ocrodjvu
View on GitHub
OCR for DjVu
☆46Oct 3, 2022Updated 3 years ago
Dedsec-Xu / DatasetImgLabel-ICDAR2015
View on GitHub
DatasetImgLabeler is a image annotation tool for researchers to prepare datasets in ICDAR2015 format
☆12Dec 7, 2019Updated 6 years ago
Planteome / plant-stress-ontology
View on GitHub
An ontology containing biotic and abiotic plant stresses. Part of the Planteome suite of reference ontologies. Formerly called the Onto…
☆18Apr 14, 2026Updated 3 months ago
neuged / webanno_tsv
View on GitHub
A small python library to parse and write TSV files generated by the WebAnno software.
☆11Apr 14, 2025Updated last year
caarlos0-graveyard / github-vacations
View on GitHub
Automagically ignore all notifications related to work when you are on vacations
☆21Aug 21, 2020Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
jd-coderepos / contributions-ner-cs
View on GitHub
This repository hosts the dataset for the paper Computer Science Named Entity Recognition in the Open Research Knowledge Graph
☆21Jan 8, 2024Updated 2 years ago
PRImA-Research-Lab / prima-aletheia-web-emop
View on GitHub
Web-based page layout editor created for EMOP (Early Modern OCR Project).
☆11May 21, 2021Updated 5 years ago
lfoppiano / grobid-quantities
View on GitHub
GROBID extension for identifying and normalizing physical quantities.
☆85Apr 8, 2026Updated 3 months ago
JanaLasser / SICSS-aachen-graz
View on GitHub
Repository for the learning materials of the Aachen-Graz SICSS location.
☆19Oct 19, 2023Updated 2 years ago
Aazhar / keras2tensorflow
View on GitHub
Tutorial on running keras model in C++ and python tensorflow
☆11Oct 30, 2018Updated 7 years ago
ResearchingDexter / ICDAR2019RecTS
View on GitHub
character recognition, textline recognition
☆10Aug 31, 2019Updated 6 years ago
kermitt2 / arxiv_harvester
View on GitHub
Poor man's simple harvester for arXiv resources
☆14Jul 14, 2023Updated 3 years ago
munnafaisal / Deep-Object-Search-With-Hash
View on GitHub
Search your object with hash
☆12Dec 8, 2022Updated 3 years ago
stefan-it / gc4lm
View on GitHub
GC4LM: A Colossal (Biased) language model for German
☆13May 2, 2021Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
syedsaqibbukhari / docanalysis
View on GitHub
☆10Aug 5, 2019Updated 6 years ago
PierreSenellart / theoremkb
View on GitHub
Collection of tools to extract semantic information from (mathematical) research articles
☆24Updated this week
PRImA-Research-Lab / prima-page-to-pdf
View on GitHub
Java command line tool to convert PAGE XML files with layout and text content to PDF
☆10Apr 27, 2020Updated 6 years ago
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
choonkiatlee / pi-torch
View on GitHub
☆10Apr 21, 2020Updated 6 years ago
OCR-D / ocrd_pagetopdf
View on GitHub
OCR-D wrapper for prima-pagetopdf
☆10Oct 30, 2025Updated 8 months ago
ertugrulcetin / procedure.async
View on GitHub
Async procedures for Clojure
☆13Oct 5, 2022Updated 3 years ago