ckorzen / pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆66Updated 4 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark:
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
- A Named-Entity Recogniser based on Grobid.☆52Updated 7 months ago
- ☆91Updated 2 years ago
- GROBID extension for identifying and normalizing physical quantities.☆81Updated 7 months ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆85Updated 2 years ago
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 9 months ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆112Updated 3 months ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆68Updated last year
- Framework for information extraction from tables☆41Updated 6 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆161Updated 2 years ago
- Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)☆54Updated 2 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆176Updated 2 years ago
- An open information extraction system that provides compact extractions☆91Updated 3 years ago
- A machine learning tool for fishing entities☆264Updated last month
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 7 years ago
- Corpus of Open Access articles from multiple fields in Science, Technology, and Medicine.☆73Updated 8 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- multimodal document analysis☆164Updated 11 months ago
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆139Updated 2 years ago
- Python text processing, pattern matching, and NLP framework☆65Updated last year
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 2 years ago
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-ser…☆45Updated 7 months ago
- Inter-annotator agreement for Doccano☆27Updated 5 years ago
- Python tools for interacting with Wikidata☆153Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆98Updated last year
- ☆80Updated 3 years ago
- Sentence transformers models for SpaCy☆107Updated 2 years ago