ckorzen / pdf-text-extraction-benchmarkLinks
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆68Updated 4 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
Sorting:
- GROBID extension for identifying and normalizing physical quantities.☆83Updated 3 weeks ago
- A Named-Entity Recogniser based on Grobid.☆55Updated 2 months ago
- ☆91Updated 3 years ago
- Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-ser…☆46Updated 9 months ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆85Updated 2 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- Framework for information extraction from tables☆41Updated 6 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆25Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 5 months ago
- Finds linguistic patterns effortlessly☆36Updated last year
- PDF to XML ALTO file converter☆246Updated this week
- A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.☆95Updated 3 years ago
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 11 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- A machine learning tool for fishing entities☆263Updated last month
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆141Updated 3 years ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆54Updated 2 years ago
- The Semantic Scholar Search Reranker☆109Updated 4 years ago
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81Updated 3 years ago
- A Super-Lightweight Annotation Tool for Experts: Label text in a terminal with just Python☆111Updated last month
- A visualisation tool for Spacy using Hierplane.☆65Updated 2 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Updated 7 years ago
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆94Updated 2 years ago
- multimodal document analysis☆166Updated last year
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 7 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆38Updated 3 years ago
- Inter-annotator agreement for Doccano☆27Updated 5 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- An open information extraction system that provides compact extractions☆92Updated 3 years ago