A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents
☆29Dec 8, 2022Updated 3 years ago
Alternatives and similar repositories for pdf-benchmark
Users that are interested in pdf-benchmark are comparing it to the libraries listed below
Sorting:
- The Science knowledge graph ontologies, a.k.a. SKGO, is a suite of OWL ontology models to capture the knowledge of scientific research da…☆14Jul 3, 2025Updated 8 months ago
- ☆19May 1, 2025Updated 10 months ago
- A script to generate tagged XML Citationstrings for citation parsing☆20Apr 17, 2020Updated 5 years ago
- An annotation tool for grounding of formulae☆24May 28, 2024Updated last year
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆76Dec 29, 2025Updated 2 months ago
- Notebooks and other course materials for Emory QTM 340 (Fall 2022)☆12Dec 13, 2022Updated 3 years ago
- DITA demo project☆11Sep 28, 2023Updated 2 years ago
- ☆34Jan 2, 2024Updated 2 years ago
- Aligned Neural Topic Model (ANTM) for Exploring Evolving Topics: a dynamic neural topic model that uses document embeddings (data2vec) to…☆37Nov 6, 2023Updated 2 years ago
- ☆118Feb 24, 2026Updated last week
- Scholarly Big Data Subject Category Classifier☆10Jul 15, 2019Updated 6 years ago
- MS Marco Entity Annotations Disambiguation☆13May 19, 2023Updated 2 years ago
- Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support☆13Jun 28, 2025Updated 8 months ago
- CPU miner for Litecoin and Bitcoin☆16Mar 27, 2014Updated 11 years ago
- A home for additional useful tasks and types for Ant (http://ant.apache.org).☆10May 7, 2024Updated last year
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- Enhancing virtual KG access over tabular data with RML and CSVW☆12Jan 7, 2023Updated 3 years ago
- This is the code for reproducing the TABBIE baseline in our paper: "Retrieval-Based Transformer for Table Augmentation"☆12Sep 17, 2025Updated 5 months ago
- ☆11Apr 15, 2022Updated 3 years ago
- functionality on top of an RDF store while accounting for and exploiting the fundamental differences between graph storage and relation…☆12Feb 21, 2024Updated 2 years ago
- Incorporating VIsual LAyout Structures for Scientific Text Classification☆179Mar 18, 2023Updated 2 years ago
- ☆11Aug 8, 2025Updated 6 months ago
- r2Symbols : Direct insertion of over 1000 HTML symbol entities in Rmarkdown, Quarto and Shiny Applications☆10Mar 17, 2023Updated 2 years ago
- Code for EMNLP'20 paper "When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models"☆11Nov 10, 2020Updated 5 years ago
- Representation of XML Schemas in OWL syntax☆10Feb 16, 2016Updated 10 years ago
- Citation Extraction and Classifier☆16Jan 15, 2026Updated last month
- ☆10May 1, 2018Updated 7 years ago
- Collection of LaTeX utility packages for scientific documents☆17Sep 13, 2023Updated 2 years ago
- Open Drug Database for Switzerland☆10Updated this week
- Samples for using "convert" protocol to convert various resources to DITA☆14Dec 6, 2021Updated 4 years ago
- The teach.org publishing service for goggles and thimble☆16Dec 16, 2019Updated 6 years ago
- Lit integration to use svelte stores as cross element state management☆12Nov 14, 2023Updated 2 years ago
- Java API for LXD☆12Jan 10, 2018Updated 8 years ago
- A FoundationDB backend plugin for mnesia, based on mnesia_rocksdb☆12Dec 1, 2020Updated 5 years ago
- The qlever command-line tool. With this you can control (almost) everything QLever can do☆64Updated this week
- The web application for GraphDB APIs☆56Updated this week
- A Gradle plugin for running DITA Open Toolkit☆13Jan 12, 2021Updated 5 years ago
- RxDB + SvelteKit☆13Jun 21, 2021Updated 4 years ago
- 在不调用公开源码或函数的情况下用python手动实现基于ID3算法和CART算法的两种决策树分类模型,并评估其优劣。☆16Jan 8, 2022Updated 4 years ago