ckorzen / pdf-text-extraction-benchmarkLinks
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆69Updated 5 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
Sorting:
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- The Semantic Scholar Search Reranker☆107Updated 5 years ago
- A Named-Entity Recogniser based on Grobid.☆54Updated 7 months ago
- multimodal document analysis☆166Updated 2 months ago
- GROBID extension for identifying and normalizing physical quantities.☆83Updated 6 months ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 3 years ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆59Updated 3 years ago
- Framework for information extraction from tables☆40Updated 6 years ago
- A collection of simple tutorials for using Fonduer☆100Updated 5 years ago
- PDF to XML ALTO file converter☆259Updated last week
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 11 months ago
- Science-parse version 2☆251Updated 6 years ago
- Finds linguistic patterns effortlessly☆39Updated 2 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆87Updated 3 years ago
- ☆95Updated 3 years ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 3 years ago
- Toolbox for OCR post-correction☆122Updated 6 years ago
- 🚀GUI for training spaCy models☆55Updated 4 years ago
- Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)☆55Updated 3 years ago
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆143Updated 3 years ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 8 years ago
- A python module for word inflections designed for use with spaCy.☆93Updated 5 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆95Updated 2 years ago
- Named entity recognition for the legal domain☆42Updated 4 years ago
- Extracting scientific claims from biomedical abstracts (powered by AllenNLP)☆143Updated 4 years ago
- ☆32Updated 3 years ago
- A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata☆169Updated 3 years ago
- Get annotation suggestions for the INCEpTION text annotation platform from spaCy, Sentence BERT, scikit-learn and more. Runs as a web-ser…☆47Updated 2 months ago
- A machine learning tool for fishing entities☆270Updated 7 months ago