ckorzen / pdf-text-extraction-benchmarkLinks
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆68Updated 4 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
Sorting:
- GROBID extension for identifying and normalizing physical quantities.☆82Updated last month
- A Named-Entity Recogniser based on Grobid.☆55Updated 2 months ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- The Semantic Scholar Search Reranker☆109Updated 4 years ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆25Updated 2 years ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆54Updated 2 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- multimodal document analysis☆165Updated last year
- ☆91Updated 3 years ago
- A machine learning tool for fishing entities☆264Updated 2 months ago
- A collection of simple tutorials for using Fonduer☆100Updated 4 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆85Updated 2 years ago
- PDF to XML ALTO file converter☆248Updated 3 weeks ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 2 years ago
- Framework for information extraction from tables☆41Updated 6 years ago
- Science-parse version 2☆245Updated 5 years ago
- Finds linguistic patterns effortlessly☆37Updated last year
- Toolbox for OCR post-correction☆121Updated 5 years ago
- Python text processing, pattern matching, and NLP framework☆66Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆113Updated 6 months ago
- ☆139Updated last year
- 🚀GUI for training spaCy models☆55Updated 4 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆63Updated 6 months ago
- An open information extraction system that provides compact extractions☆92Updated 3 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆68Updated 2 years ago
- Custom Natural Language Processing with big and small models 🌲🌱☆68Updated 3 years ago
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆142Updated 3 years ago
- CrowdTruth framework for crowdsourcing ground truth for training & evaluation of AI systems☆61Updated last year
- Wikidata embedding☆50Updated 8 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago