janedoesrepo/pdfreader

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/janedoesrepo/pdfreader)

janedoesrepo / pdfreader

Extracting Semi-Structured Data from PDFs on a large scale

☆52

Alternatives and similar repositories for pdfreader

Users that are interested in pdfreader are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

woldemarg / unstructured_data_post
View on GitHub
test
☆22Nov 11, 2020Updated 5 years ago
MBAigner / PDFSegmenter
View on GitHub
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…
☆23Sep 11, 2020Updated 5 years ago
KunstDerFuge / Q-notebook
View on GitHub
☆14Jul 26, 2021Updated 4 years ago
cmu-sei / nabu
View on GitHub
Graphical analysis of PDF structure.
☆13Jan 9, 2017Updated 9 years ago
irgroup / repro_eval
View on GitHub
A Python Interface to Reproducibility Measures of System-Oriented IR Experiments
☆11Dec 2, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
hmnth1 / table_ocr
View on GitHub
☆13Oct 1, 2020Updated 5 years ago
AudiTranscribe / AudiTranscribe
View on GitHub
An open-source music transcription application.
☆13Sep 9, 2023Updated 2 years ago
BobLd / simple-docstrum
View on GitHub
A step-by-step C# implementation of the Docstrum algorithm
☆24Dec 13, 2020Updated 5 years ago
virtualsociety / ai-table-recognition
View on GitHub
☆39Sep 26, 2020Updated 5 years ago
the-black-knight-01 / Table-Detection-using-Deep-Learning
View on GitHub
Table Detection using Deep Learning
☆27May 29, 2021Updated 5 years ago
cmacdonald / pyt_splade
View on GitHub
☆15Jun 26, 2026Updated 3 weeks ago
siaen / python_finance_course
View on GitHub
CEU python for finance course material
☆22Feb 25, 2020Updated 6 years ago
8080labs / ipyslickgrid
View on GitHub
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
☆11Mar 15, 2022Updated 4 years ago
LynnXie00 / ObsidianGanntChartfromTasks
View on GitHub
If you use obsidian tasks and dataview, this piece of codeblock will generate mermaid chart automatically
☆15Dec 24, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
madhav1ag / CDeCNet
View on GitHub
CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images
☆134Sep 11, 2025Updated 10 months ago
ContinuumIO / pivottablejs-airgap
View on GitHub
pivottablejs for air-gapped systems
☆13Jun 25, 2026Updated 3 weeks ago
nikolamilosevic86 / TableDisentangler
View on GitHub
Functional and structural analysis of tables in research papers (Table disentangling)
☆21Aug 7, 2017Updated 8 years ago
AmenRa / indxr
View on GitHub
A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.
☆23Nov 9, 2025Updated 8 months ago
jonaslund / Selfsurfing
View on GitHub
☆19May 31, 2013Updated 13 years ago
jienagu / rpivotTableMD
View on GitHub
This is a Shiny app to fetch users' activity and interact with Rmarkdown (pdf/word) report
☆17Apr 22, 2019Updated 7 years ago
keanudicap / MSQA
View on GitHub
Microsoft question-answering dataset
☆10Jun 16, 2023Updated 3 years ago
dlight / pdftotext
View on GitHub
High-level Rust library that binds to Poppler to extract text from a PDF
☆11Dec 16, 2020Updated 5 years ago
tira-io / tira
View on GitHub
The source code for the TIRA Shared Task Platform
☆18Jul 14, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
irgroup / Qbias
View on GitHub
𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions
☆23Mar 1, 2023Updated 3 years ago
yahiathen / cutie-for-invoices
View on GitHub
Deep neural network to extract intelligent information from invoice documents using PyTorch.
☆16Aug 31, 2022Updated 3 years ago
openanalytics / clinDataReview
View on GitHub
☆12Jun 18, 2024Updated 2 years ago
Cubicpath / HaloInfiniteGetter
View on GitHub
A simple GUI app to get live Halo data straight from Halo Waypoint.
☆12Jul 31, 2024Updated last year
Tongzhenguo / shanghai_unicom_tourist_tagging
View on GitHub
init
☆11Sep 30, 2017Updated 8 years ago
kasnerz / d2t_iterative_editing
View on GitHub
Code for the paper Data-to-Text Generation with Iterative Text Editing
☆14Mar 23, 2021Updated 5 years ago
kingabzpro / WOLOF-ASR-Wav2Vec2
View on GitHub
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
☆18Nov 13, 2021Updated 4 years ago
YipingNUS / OptimSeed
View on GitHub
OptimSeed - Seed Word Selection for Weakly-Supervised Text Classification [NAACL SRW 2021]
☆14Mar 29, 2021Updated 5 years ago
tilde-lab / pyfactxx
View on GitHub
Python bindings for upgraded FaCT++ description logic reasoner
☆31May 1, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hyperspy / hyperspy_gui_ipywidgets
View on GitHub
ipywidgets GUI elements for HyperSpy
☆11Jul 1, 2026Updated 2 weeks ago
baulbo / Diard
View on GitHub
From document (PDF) or document images to analysis ready semi-structured data.
☆20Nov 4, 2022Updated 3 years ago
sufaith / python_weixin
View on GitHub
The development of WeChat Python
☆15Dec 9, 2020Updated 5 years ago
William1617 / DTLN_RKNN
View on GitHub
☆10May 30, 2024Updated 2 years ago
blender-nlp / NewsClaims
View on GitHub
☆19Sep 10, 2022Updated 3 years ago
SafetyGraphics / hep-explorer
View on GitHub
Interactive Graphic for Exploring Liver Function Data in Clinical Trials
☆11Mar 4, 2023Updated 3 years ago
zakaria-29-dev / vuejs-quasar-framework-admin-dashboard-ui
View on GitHub
Vuejs - Quasar Framework Admin Dashboard UI Deisin
☆18Feb 19, 2021Updated 5 years ago