allenai/pdffigures

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/pdffigures)

allenai / pdffigures

Command line tool to extract figures, tables, and captions from scholarly documents in PDF form.

☆130

Alternatives and similar repositories for pdffigures

Users that are interested in pdffigures are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / pdffigures2
View on GitHub
Given a scholarly PDF, extract figures, tables, captions, and section titles.
☆749Mar 10, 2024Updated 2 years ago
allenai / deepfigures-open
View on GitHub
Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖
☆148Jun 14, 2022Updated 4 years ago
SeerLabs / pdfmef
View on GitHub
Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)
☆31Oct 3, 2023Updated 2 years ago
allenai / figureseer
View on GitHub
☆41May 15, 2020Updated 6 years ago
pdfliberation / knowledge
View on GitHub
A place to collect and share knowledge about liberating data from PDFs
☆55Jan 30, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
domoritz / label_generator
View on GitHub
Training data generator for text detection
☆38Jul 16, 2020Updated 5 years ago
tamirhassan / pdfxtk
View on GitHub
PDF Extraction Toolkit
☆43Nov 23, 2020Updated 5 years ago
data-liberation / table-understanding-dataset
View on GitHub
table understanding dataset for comparative evaluation of different table understanding algorithms
☆13Jun 15, 2018Updated 8 years ago
microsoft / mag-covid19-research-examples
View on GitHub
Examples or utilizing Microsoft Academic for conducting covid-19 research
☆23Dec 26, 2022Updated 3 years ago
allenai / science-parse
View on GitHub
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
☆700May 26, 2024Updated 2 years ago
CeON / CERMINE
View on GitHub
Content ExtRactor and MINEr
☆512Jun 30, 2022Updated 4 years ago
gchrupala / morfette
View on GitHub
Supervised learning of morphology
☆28Jan 17, 2017Updated 9 years ago
bob-carpenter / anno
View on GitHub
Models, scripts, and data sets for data annotation (aka coding, aka rating)
☆12Mar 9, 2015Updated 11 years ago
knmnyn / ParsCit
View on GitHub
An open-source CRF Reference String Parsing Package
☆161May 6, 2020Updated 6 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
iandees / detect-baseball-diamonds
View on GitHub
Various attempts at scanning aerial imagery to detect baseball diamonds.
☆17Jul 13, 2014Updated 11 years ago
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,256Jun 24, 2022Updated 4 years ago
domoritz / mosaic-motherduck
View on GitHub
☆14Mar 14, 2024Updated 2 years ago
zhw12 / AlgMap
View on GitHub
Code for the paper: "Mining Algorithm Roadmap in Scientific Publications" - KDD 2019
☆23Jul 22, 2023Updated 2 years ago
GMOD / indexedfasta-js
View on GitHub
Read FASTA files indexed with .fai indexes. Also supports BGZIP+.gzi
☆12May 19, 2026Updated last month
shyamupa / xelms
View on GitHub
☆19Dec 19, 2018Updated 7 years ago
oaqa / cse-framework
View on GitHub
Configuration Space Exploration Framework
☆16Oct 13, 2020Updated 5 years ago
vega / vega-loader-arrow
View on GitHub
Data loader for the Apache Arrow format.
☆65Apr 2, 2026Updated 3 months ago
greenelab / hclust
View on GitHub
Agglomerative hierarchical clustering in JavaScript
☆19Dec 17, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
inukshuk / anystyle.io
View on GitHub
☆24Mar 3, 2024Updated 2 years ago
jupyter-widgets-contrib / anywidget-lite
View on GitHub
Prototype your Jupyter Widget in the browser with anywidget and JupyterLite 💡
☆17Apr 7, 2025Updated last year
gunoodaddy / SharedPainter
View on GitHub
cross platform multiuser network painting program using Qt, boost asio
☆41Oct 15, 2012Updated 13 years ago
hms-dbmi / cistrome-explorer
View on GitHub
Interactive visual analytic tool for exploring epigenomics data w/ associated metadata, powered by HiGlass and Gosling
☆13Nov 10, 2023Updated 2 years ago
Azure-Samples / data-lake-store-adls-dot-net-get-started
View on GitHub
This sample .Net application shows you how to use the .Net SDK to read and write files to Azure Data Lake Store, and do other filesystem …
☆10Oct 18, 2023Updated 2 years ago
lisongx / wikidata-elements
View on GitHub
Custom HTML elements to reuse Wikidata
☆14Jan 6, 2023Updated 3 years ago
Automattic / atd-core
View on GitHub
Core UI Module for After the Deadline
☆20Mar 5, 2022Updated 4 years ago
RUCAIBox / Citation-Count-Prediction
View on GitHub
this repository contains the dataset and the source code for the EMNLP 2019 paper "A Neural Citation Count Prediction Model based on Peer…
☆10Oct 8, 2021Updated 4 years ago
mhyfritz / hilbert-curve
View on GitHub
2D Hilbert curve mapping in JavaScript
☆16Jan 4, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MicrosoftDocs / microsoft-academic-services
View on GitHub
☆18Nov 13, 2024Updated last year
gr2m / dream-pdf
View on GitHub
A modular JavaScript library to create PDFs
☆11Mar 5, 2021Updated 5 years ago
sul-dlss / dlme
View on GitHub
Digital Library of the Middle East web application, based on Spotlight
☆21Updated this week
tabulapdf / tabula-extractor
View on GitHub
Extract tables from PDF files
☆358May 17, 2016Updated 10 years ago
shangjingbo1226 / PL2M
View on GitHub
☆16Dec 6, 2014Updated 11 years ago
mike0sv / Reuters-full-data-set
View on GitHub
Full dataset of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016).
☆22Aug 17, 2016Updated 9 years ago
konklone / bit.voyage
View on GitHub
Allow anyone with a modern browser to stream a 1GB, 10GB, 100GB, or 1TB file over the Internet and into a happy home.
☆32Oct 7, 2018Updated 7 years ago