A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools together to generate a full XML document.
☆297Dec 24, 2025Updated 2 months ago
Alternatives and similar repositories for sciencebeam-parser
Users that are interested in sciencebeam-parser are comparing it to the libraries listed below
Sorting:
- Content ExtRactor and MINEr☆513Jun 30, 2022Updated 3 years ago
- A machine learning software for extracting information from scholarly documents☆4,659Feb 21, 2026Updated last week
- Science-parse version 2☆254Nov 20, 2019Updated 6 years ago
- Web-based page layout editor created for EMOP (Early Modern OCR Project).☆11May 21, 2021Updated 4 years ago
- Universalizing Open-Access Journals & Papers☆19Mar 8, 2017Updated 8 years ago
- A Knowledge Base for research software relying on large-scale text mining and curated knowledge sources☆16May 14, 2023Updated 2 years ago
- Science Parse parses scientific papers (in PDF form) and returns them in structured form.☆697May 26, 2024Updated last year
- Production code for PrePubMed☆17Sep 30, 2019Updated 6 years ago
- A repository for materials and issues related to the Joint Roadmap for Open Science Tools (JROST) itself as a community and project.☆22Sep 18, 2018Updated 7 years ago
- Authorea's collection of LaTeX-based export styles for scholarly writing☆20Sep 19, 2016Updated 9 years ago
- OpenCitations provides in RDF accurate citation information harvested from the scholarly literature.☆68Feb 19, 2018Updated 8 years ago
- Perpetual Access To The Scholarly Record☆121Jul 31, 2024Updated last year
- MOVED TO https://gitlab.com/crossref/pdfextract☆510Jul 26, 2017Updated 8 years ago
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆148Jun 19, 2025Updated 8 months ago
- Automorphism groups for CSets - generalizing the nauty algorithm to a broad class of data structures☆14Oct 30, 2023Updated 2 years ago
- Light and dark variants for Visual Studio Code of the Base16 Grayscale theme by Chris Kempson☆10May 11, 2017Updated 8 years ago
- Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.☆12Jan 13, 2015Updated 11 years ago
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆726Mar 10, 2024Updated last year
- Medical records you can copy and paste☆12Mar 3, 2023Updated 2 years ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Oct 3, 2023Updated 2 years ago
- Zorba - the NoSQL processor☆41Dec 13, 2023Updated 2 years ago
- ☆11Oct 12, 2020Updated 5 years ago
- ☆10Mar 16, 2023Updated 2 years ago
- 🫓 A parser for the FlatZinc modelling language☆14Feb 27, 2025Updated last year
- produce a stream of citiation data coming off wikimedia☆12Mar 28, 2017Updated 8 years ago
- Repository of the Crowdsourced Open Citations Index (CROCI)☆10Mar 19, 2019Updated 6 years ago
- Open database of scholarly journals☆10Oct 26, 2022Updated 3 years ago
- Planning OpenCon Openly☆11Jan 31, 2018Updated 8 years ago
- Lightweight piece tokenization library☆12Apr 15, 2024Updated last year
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆460Aug 3, 2023Updated 2 years ago
- Fast and robust NLP components implemented in Java.☆53Oct 13, 2020Updated 5 years ago
- Research Paper Review Notes☆13Oct 26, 2018Updated 7 years ago
- Building a Raspberry Pi flight controller☆12Nov 30, 2015Updated 10 years ago
- ☆13Jul 2, 2017Updated 8 years ago
- Core libraries by the PRImA Research Lab☆16Jul 30, 2024Updated last year
- R package providing basic command line optional argument parsing☆12Oct 1, 2023Updated 2 years ago
- XML Director - XML Content Management☆16Jan 11, 2024Updated 2 years ago
- Emacs mode for editing MiniZinc model file☆12Apr 26, 2023Updated 2 years ago
- ☆14Feb 8, 2019Updated 7 years ago