SciKnowEngine / lapdftextLinks
LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works quβ¦
β15Updated 6 years ago
Alternatives and similar repositories for lapdftext
Users that are interested in lapdftext are comparing it to the libraries listed below
Sorting:
- π Dehyphenation of broken text (mainly German), i.e., extracted from a PDFβ39Updated 3 years ago
- Web interface that allows users to perform computer-assisted text annotationβ16Updated 2 years ago
- A context-based spellchecker for correcting OCR output.β19Updated 2 years ago
- Service for converting and enhancing heterogeneous publisher XML formats into TEIβ55Updated 8 months ago
- Text annotation tool for team collaborationβ41Updated last year
- π Python Package to reconstruct the original continuous text from PDFs with language modelsβ32Updated last year
- A text annotation plugin for Protege 5+β17Updated 2 years ago
- Entity linking, entity typing and relation extraction: Matching CSV to a Wikibase instance (e.g., Wikidata) via Meta-lookupβ70Updated 4 years ago
- This page is a companion for the paper titled Towards Automatic Structuring and Semantic Indexing of Legal Documentsβ29Updated 6 years ago
- Convert between Tesseract hOCR and ALTO XML using XSL stylesheetsβ55Updated 2 weeks ago
- Ontologies of Linguistic Annotation. Machine-readable tagsets and annotation schemata for more than 100 languages.β20Updated 3 weeks ago
- A machine learning tool for fishing entitiesβ264Updated 2 weeks ago
- Javascript based component for highlighting text-mined annotations of different semantic types in a full text article identified by a PMCβ¦β11Updated 8 years ago
- Conversions between various OCR formatsβ78Updated 2 years ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)β31Updated last year
- β22Updated 2 years ago
- tesseractXplore a tesseract ease of use gui with full controlβ22Updated 3 years ago
- Citation Classification using hybrid neural network model for Wikipedia Referencesβ28Updated 2 years ago
- Toolkit with state-of-the-art Automatic Terms Recognition methods in Scalaβ35Updated 6 years ago
- This repository contains the code accompanying the paper "Learning Informative Representations of Biomedical Relations with Latent Variabβ¦β14Updated 3 years ago
- The eNanoMapper ontologyβ19Updated last month
- Knowledge graph construction: Fast inserts into a Wikibase instanceβ45Updated 3 years ago
- Multi Tier Annotation Searchβ12Updated last year
- The Software Ontology (SWO) is a resource for describing software tools, their types, tasks, versions, licences, provenance and associateβ¦β44Updated 2 years ago
- Machine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searchesβ41Updated 7 years ago
- π¦ The Knowledge Box - A data dependency management framework to help users to publish, find and install data modelsβ45Updated last year
- A suite of batches and tools for OCR tasks.β71Updated 2 years ago
- import information (affiliation, education) from ORCID database to Wikidata regarding authors of scientific papersβ15Updated 2 years ago
- Neuralized version of the Reference String Parser component of the ParsCit package.β81Updated 3 years ago
- NLP toolkit for those nonsensical ontologiesβ16Updated last month