zanibbi / SymbolScraperLinks
Apache PDFBox extension for precisely extracting character/symbol locations and identities from born-digital PDF files.
☆19Updated 2 months ago
Alternatives and similar repositories for SymbolScraper
Users that are interested in SymbolScraper are comparing it to the libraries listed below
Sorting:
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆142Updated 3 years ago
- ☆94Updated 3 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆69Updated 5 years ago
- Workshop Home Page for Benchmarking: Past, Present and Future☆35Updated 4 years ago
- Dataset accompanying the SPECTER model☆141Updated 2 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- SciWING is a modern toolkit for scientific document processing from WING-NUS☆63Updated 2 years ago
- Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122☆135Updated last year
- NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)☆133Updated 3 years ago
- multimodal document analysis☆166Updated this week
- Science-parse version 2☆252Updated 5 years ago
- ☆14Updated 3 years ago
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆294Updated last year
- ☆58Updated 4 years ago
- Direct Attentive Dependency Parser☆54Updated last year
- A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP☆107Updated 3 years ago
- Neuralized version of the Reference String Parser component of the ParsCit package.☆81Updated 3 years ago
- Extracting scientific claims from biomedical abstracts (powered by AllenNLP)☆144Updated 4 years ago
- QED: A Framework and Dataset for Explanations in Question Answering☆118Updated 4 years ago
- An implementation of GrASP (Shnarch et. al., 2017)☆23Updated 3 years ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆71Updated 3 years ago
- ☆141Updated last year
- Repository for NAACL 2019 paper on Citation Intent prediction☆125Updated 5 years ago
- Tools to bulk download arxiv data☆131Updated 7 years ago
- LongSumm - Scientific Document Summarization Task☆74Updated 3 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆121Updated 4 years ago
- ☆40Updated 4 years ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆38Updated 3 years ago
- Tools for extracting tables and results from Machine Learning papers☆431Updated 2 years ago
- Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.☆130Updated 7 years ago