ckorzen / pdf-text-extraction-benchmark
A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF documents, especially from scientific articles.
☆66Updated 4 years ago
Alternatives and similar repositories for pdf-text-extraction-benchmark:
Users that are interested in pdf-text-extraction-benchmark are comparing it to the libraries listed below
- GROBID extension for identifying and normalizing physical quantities.☆77Updated 5 months ago
- ☆92Updated 2 years ago
- A Named-Entity Recogniser based on Grobid.☆50Updated 5 months ago
- Framework for information extraction from tables☆41Updated 5 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆175Updated last year
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at…☆22Updated 6 months ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- A collection of simple tutorials for using Fonduer☆99Updated 4 years ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 7 years ago
- spaCy pipeline component for generating spaCy KnowledgeBase Alias Candidates for Entity Linking☆85Updated 2 years ago
- Mining Legal Arguments in Court Decisions - Data and software☆66Updated last year
- 🧪 Cutting-edge experimental spaCy components and features☆96Updated 9 months ago
- In the wild extraction of entities that are found using Flair and displayed using a very elegant front-end.☆71Updated 2 years ago
- The Semantic Scholar Search Reranker☆104Updated 4 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- multimodal document analysis☆162Updated 8 months ago
- Data and additional information regarding the paper: Contract Discovery. Dataset and a Few-Shot Semantic Retrieval Challenge with Competi…☆30Updated 4 years ago
- Inter-annotator agreement for Doccano☆27Updated 4 years ago
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆63Updated 11 months ago
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- A spaCy wrapper of OpenTapioca for named entity linking on Wikidata☆93Updated last year
- Spacy pipeline object for extracting values that correspond to a named entity (e.g., birth dates, account numbers, laboratory results)☆54Updated 2 years ago
- LegalCrawler: A tool for automated scraping of English legal corpora☆53Updated 2 years ago
- Examples for aligning, padding and batching sequence labeling data (NER) for use with pre-trained transformer models☆65Updated 2 years ago
- Streamlit Named Entity Recognition (NER) annotation custom component☆39Updated 2 years ago
- Tool for disambiguating acronyms and abbreviations in text for NLP applications☆21Updated 8 months ago
- Named entity recognition for the legal domain☆41Updated 3 years ago
- ☆30Updated 2 years ago