zanibbi / SymbolScraperLinks
Apache PDFBox extension for precisely extracting character/symbol locations and identities from born-digital PDF files.
☆19Updated 4 months ago
Alternatives and similar repositories for SymbolScraper
Users that are interested in SymbolScraper are comparing it to the libraries listed below
Sorting:
- Workshop Home Page for Benchmarking: Past, Present and Future☆35Updated 4 years ago
- ☆97Updated 3 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Updated 2 years ago
- ☆58Updated 4 years ago
- Direct Attentive Dependency Parser☆54Updated last year
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆144Updated 3 years ago
- Converter from UD-trees to BART representation☆36Updated last year
- Code and material for the AllenNLP Guide☆86Updated 2 years ago
- Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/☆90Updated this week
- Code for the paper SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts (AKBC 2021). https://openreview.net/forum?id=OF…☆29Updated 4 years ago
- A framework for building semantic parsers (including neural module networks) with AllenNLP, built by the authors of AllenNLP☆108Updated 3 years ago
- SciWING is a modern toolkit for scientific document processing from WING-NUS☆63Updated 2 years ago
- A project about benchmarking and evaluating existing PDF extraction tools on their semantic abilities to extract the body texts from PDF …☆69Updated 5 years ago
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆121Updated 5 years ago
- ☆40Updated 4 years ago
- Factorization of the neural parameter space for zero-shot multi-lingual and multi-task transfer☆39Updated 5 years ago
- Data/Code Repository for https://api.semanticscholar.org/CorpusID:218470122☆138Updated last year
- Hyperparameter Search for AllenNLP☆140Updated 11 months ago
- multimodal document analysis☆166Updated 2 months ago
- Helper scripts and notes that were used while porting various nlp models☆49Updated 3 years ago
- ☆75Updated 4 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- a large scientific paraphrase dataset for longer paraphrase generation☆39Updated 3 years ago
- Implementation of the GBST block from the Charformer paper, in Pytorch☆118Updated 4 years ago
- UFSAC is a resource containing all WordNet Sense Annotated Corpora, and a Java library for manipulating them☆38Updated 3 years ago
- ☆21Updated 4 years ago
- LM Pretraining with PyTorch/TPU☆137Updated 6 years ago
- ☆14Updated 3 years ago
- Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"☆199Updated 2 years ago
- A dataset of atomic wikipedia edits containing insertions and deletions of a contiguous chunk of text in a sentence. This dataset contai…☆105Updated 6 years ago