SciKnowEngine / lapdftext
LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance where needed). The system is open-source and provides a simple baseline function for extracting text from primary research articles using rules that developers can customize. This means that the system works qu…
☆15Updated 6 years ago
Alternatives and similar repositories for lapdftext:
Users that are interested in lapdftext are comparing it to the libraries listed below
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Updated last year
- Supervised learning of morphology☆28Updated 8 years ago
- Text annotation tool for team collaboration☆40Updated last year
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago
- Web interface that allows users to perform computer-assisted text annotation☆16Updated 2 years ago
- Service for converting and enhancing heterogeneous publisher XML formats into TEI☆54Updated 7 months ago
- Table compiling the list of biomedically-related corpora available for named entity recognition (and some also suitable for association d…☆18Updated 7 years ago
- A Named-Entity Recogniser based on Grobid.☆52Updated 7 months ago
- Match entities between CiteSeerX and other digital libraries☆12Updated 5 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆48Updated 2 years ago
- Convert between Tesseract hOCR and ALTO XML using XSL stylesheets☆55Updated 9 months ago
- Finds linguistic patterns effortlessly☆36Updated last year
- Javascript based component for highlighting text-mined annotations of different semantic types in a full text article identified by a PMC…☆11Updated 8 years ago
- The CIS OCR PostCorrectionTool☆42Updated 2 years ago
- meTypeset is a tool to convert from Microsoft Word .docx format to NLM/JATS-XML for scholarly/scientific article typesetting.☆91Updated 2 years ago
- JATS Preview Stylesheets☆53Updated last year
- neonion is a user-centered collaborative semantic annotation webapp developed at the Human-Centered Computing group at Freie Universität …☆68Updated 6 years ago
- GROBID extension for identifying and normalizing physical quantities.☆80Updated 7 months ago
- A deep learning model for extracting references from text☆28Updated last year
- Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)☆23Updated 3 years ago
- Full-featured PDF viewer with enhancements especially for academic papers☆16Updated last week
- OpenRefine Reconciliation Framework in Python and Flask☆21Updated last year
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- Multi Tier Annotation Search☆12Updated 11 months ago
- ☆24Updated 3 weeks ago
- Framework for information extraction from tables☆41Updated 6 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Utility to translate NIF files across identifier schemes, such as DBpedia and Wikidata☆12Updated 5 years ago