JonathanLink/PDFLayoutTextStripper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JonathanLink/PDFLayoutTextStripper)

JonathanLink / PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

☆1,608

Alternatives and similar repositories for PDFLayoutTextStripper

Users that are interested in PDFLayoutTextStripper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thoqbk / traprange
View on GitHub
(Java)A Method to Extract Tabular Content from PDF Files
☆340Apr 22, 2023Updated 3 years ago
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,255Jun 24, 2022Updated 4 years ago
Staffjoy / suite
View on GitHub
Staffjoy V1, aka "Suite" - a scheduling application for hundreds of workers
☆856Mar 28, 2018Updated 8 years ago
BafS / Gutenberg
View on GitHub
Modern framework to print the web correctly.
☆4,914Feb 3, 2024Updated 2 years ago
tabulapdf / tabula-java
View on GitHub
Extract tables from PDF files
☆2,036Mar 19, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
tabulapdf / tabula
View on GitHub
Tabula is a tool for liberating data tables trapped inside PDF files
☆7,446Mar 14, 2025Updated last year
jostmey / NakedTensor
View on GitHub
Bare bone examples of machine learning in TensorFlow
☆2,403Mar 14, 2017Updated 9 years ago
alexgreene / WikiQuiz
View on GitHub
Generates a quiz for a Wikipedia page using parts of speech and text chunking.
☆801Jul 15, 2020Updated 6 years ago
k4m4 / terminals-are-sexy
View on GitHub
💥 A curated list of Terminal frameworks, plugins & resources for CLI lovers.
☆13,053Jul 26, 2024Updated last year
marcan / takeover.sh
View on GitHub
Wipe and reinstall a running Linux system via SSH, without rebooting. You know you want to.
☆7,330Jul 27, 2021Updated 4 years ago
dbohdan / structured-text-tools
View on GitHub
A list of command-line tools for manipulating structured text data
☆7,138Feb 7, 2026Updated 5 months ago
nathancahill / Anycomplete
View on GitHub
The magic of Google Autocomplete while you're typing. Anywhere.
☆1,537May 7, 2023Updated 3 years ago
gregdurrett / berkeley-doc-summarizer
View on GitHub
The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploit…
☆745Feb 25, 2019Updated 7 years ago
xo / usql
View on GitHub
Universal command-line interface for SQL databases
☆10,041Jun 19, 2026Updated last month
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
SamPutnam / Index-2026
View on GitHub
☆2,181May 21, 2026Updated 2 months ago
attic-labs / noms
View on GitHub
The versioned, forkable, syncable database
☆7,425Aug 27, 2021Updated 4 years ago
gitpitch / gitpitch
View on GitHub
Markdown Presentations for Tech Conferences, Training, Developer Advocates, and Educators.
☆5,478Mar 1, 2021Updated 5 years ago
jlsutherland / doc2text
View on GitHub
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
☆1,279Dec 1, 2020Updated 5 years ago
qustavo / httplab
View on GitHub
The interactive web server
☆4,150Feb 5, 2024Updated 2 years ago
esbenp / pdf-bot
View on GitHub
🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about …
☆2,641Mar 7, 2024Updated 2 years ago
aisingapore / TagUI
View on GitHub
Free RPA tool by AI Singapore
☆6,312Updated this week
forter / security-101-for-saas-startups
View on GitHub
security tips for startups
☆4,650Jan 27, 2026Updated 5 months ago
antonycourtney / tad
View on GitHub
A desktop application for viewing and analyzing tabular data
☆3,471Mar 5, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
coolwanglu / pdf2htmlEX
View on GitHub
Convert PDF to HTML without losing text or format.
☆10,609Jun 2, 2023Updated 3 years ago
MichielDerhaeg / build-linux
View on GitHub
A short tutorial about building Linux based operating systems.
☆5,203Jun 3, 2024Updated 2 years ago
overshard / timestrap
View on GitHub
Time tracking you can host anywhere. Full export support in multiple formats and easily extensible.
☆1,701Apr 3, 2023Updated 3 years ago
kashav / fsql
View on GitHub
Search for files using a fun query language
☆3,985Oct 8, 2024Updated last year
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
mzucker / noteshrink
View on GitHub
Convert scans of handwritten notes to beautiful, compact PDFs
☆4,841Mar 20, 2024Updated 2 years ago
dinedal / textql
View on GitHub
Execute SQL against structured text like CSV or TSV
☆9,109Oct 22, 2023Updated 2 years ago
gaubert / gmvault
View on GitHub
gmail backup software
☆3,638May 1, 2022Updated 4 years ago
ExpediaGroup / cyclotron
View on GitHub
A web platform for constructing dashboards.
☆1,542Mar 5, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sergiotapia / magnetissimo
View on GitHub
Web application that indexes all popular torrent sites, and saves it to the local database.
☆3,082Jan 19, 2024Updated 2 years ago
humphd / have-fun-with-machine-learning
View on GitHub
An absolute beginner's guide to Machine Learning and Image Classification with Neural Networks
☆5,110Dec 19, 2021Updated 4 years ago
uber-archive / image-diff
View on GitHub
Create image differential between two images
☆2,440Aug 28, 2017Updated 8 years ago
astorfi / TensorFlow-World
View on GitHub
Simple and ready-to-use tutorials for TensorFlow
☆4,492Dec 23, 2020Updated 5 years ago
evolus / pencil
View on GitHub
The Pencil Project's unique mission is to build a free and opensource tool for making diagrams and GUI prototyping that everyone can use.…
☆9,849Jun 2, 2026Updated last month
alvarcarto / url-to-pdf-api
View on GitHub
Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.
☆7,105Jan 18, 2024Updated 2 years ago
schollz / howmanypeoplearearound
View on GitHub
Count the number of people around you by monitoring wifi signals
☆7,085Aug 17, 2024Updated last year