SciKnowEngine / lapdftextLinks
LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…
☆15Updated 6 years ago
Alternatives and similar repositories for lapdftext
Users that are interested in lapdftext are comparing it to the libraries listed below
Sorting:
- Convert between Tesseract hOCR and ALTO XML using XSL stylesheets☆55Updated last month
- Service for converting and enhancing heterogeneous publisher XML formats into TEI☆56Updated 9 months ago
- Text annotation tool for team collaboration☆41Updated last year
- Generating graph structures from OWL ontologies☆12Updated 7 years ago
- RightField is an open-source tool for adding ontology term selection to Excel spreadsheets. RightField is used by a 'Template Creator' to…☆31Updated 2 months ago
- Conversions between various OCR formats☆78Updated 2 years ago
- ☆33Updated 2 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Updated 3 years ago
- ☆23Updated last year
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Updated last year
- The Potential Drug-drug Interaction and Potential Drug-drug Interaction Evidence Ontology (DIDEO)☆15Updated last year
- Web interface that allows users to perform computer-assisted text annotation☆16Updated 2 years ago
- Keyword extraction with spaCy☆31Updated 3 years ago
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 2 years ago
- NLP toolkit for those nonsensical ontologies☆16Updated last month
- Web-component for creating & showing VSM-sentences — Visual Syntax Method☆30Updated 4 years ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81Updated 3 years ago
- ☆22Updated 2 years ago
- Minimal Named-Entity Recognizer (MER)☆58Updated 8 months ago
- The Software Ontology (SWO) is a resource for describing software tools, their types, tasks, versions, licences, provenance and associate…☆44Updated 2 years ago
- Basic and Advanced OBO Graphs: specification and reference implementation☆66Updated 6 months ago
- The CIS OCR PostCorrectionTool☆42Updated 2 years ago
- Tool that converts between BioC XML files and BioC JSON files☆16Updated 8 years ago
- PDF to XML ALTO file converter☆245Updated 3 weeks ago
- Wrapper for DKPro Core to extract lingustic information from books.☆16Updated 3 years ago
- import information (affiliation, education) from ORCID database to Wikidata regarding authors of scientific papers☆15Updated 2 years ago
- tesseractXplore a tesseract ease of use gui with full control☆23Updated 3 years ago
- Some examples of usage of Grobid in a third party java project.☆19Updated 2 years ago