dpapathanasiou / pdfminer-layout-scannerView external linksLinks
A more complete example of programming with PDFMiner, which continues where the default documentation stops
☆216Dec 3, 2019Updated 6 years ago
Alternatives and similar repositories for pdfminer-layout-scanner
Users that are interested in pdfminer-layout-scanner are comparing it to the libraries listed below
Sorting:
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,303Dec 7, 2022Updated 3 years ago
- Investigative tool for extracting relevant areas from many documents☆14Nov 17, 2015Updated 10 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆461Aug 3, 2023Updated 2 years ago
- Turns legal citations in the DOM into links☆20Mar 15, 2017Updated 8 years ago
- Community maintained fork of pdfminer - we fathom PDF☆6,889Feb 5, 2026Updated last week
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Jan 11, 2018Updated 8 years ago
- A fast and friendly PDF scraping library.☆783Oct 17, 2023Updated 2 years ago
- A PDFMiner wrapper to ease the text extraction from pdf files.☆25Apr 25, 2013Updated 12 years ago
- Utility to re-structure research papers published in US Letter or A4 format PDF files to typically remove the 2 columns layout.☆53Nov 8, 2010Updated 15 years ago
- A python script that looks for special lines in a markdown file and uses those lines to convert, clean up, and insert content from URLs i…☆16Dec 9, 2012Updated 13 years ago
- Python Automated Testing on Mac☆17Nov 14, 2014Updated 11 years ago
- Command-line tool for exploring the PAC donor-recipient relationship☆55Dec 18, 2014Updated 11 years ago
- Read data from scanned PDFs in small pieces and write to excel file☆13Oct 5, 2013Updated 12 years ago
- It includes projects related to self driving car/ Autonomous car.☆11Mar 2, 2019Updated 6 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆37Sep 8, 2014Updated 11 years ago
- A simple script to create geo-tagged image chips from high-resolution RS images for training deep learning models such as U-net.☆14Jun 29, 2021Updated 4 years ago
- An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR☆15Dec 4, 2021Updated 4 years ago
- The core of sunlightlabs' Data Commons project. Includes the Transparency Data site and the APIs that power TransparencyData.com and Infl…☆38Oct 10, 2016Updated 9 years ago
- Use SQL to instantly query stories, users and other items from Hacker News. Open source CLI. No DB required.☆18Oct 13, 2025Updated 4 months ago
- A collection of small scripts to do various things☆32Jun 29, 2015Updated 10 years ago
- PDF Extraction Toolkit☆42Nov 23, 2020Updated 5 years ago
- an unofficial code for augment-XY-CUT in XYLayoutLM☆30Jul 12, 2022Updated 3 years ago
- The simplest way to extract text from PDFs in Python☆428Jul 7, 2022Updated 3 years ago
- Vietnamese handwritten text recognition system☆17May 2, 2021Updated 4 years ago
- A zero-shot captcha solver.☆16Dec 22, 2023Updated 2 years ago
- Documentation and use cases for ALTO XML☆42Sep 10, 2018Updated 7 years ago
- Drop-in replacement for Pythonista ui.TextView, with convenience features for markdown editing and HTML view mode.☆39Jun 25, 2021Updated 4 years ago
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,253Jun 24, 2022Updated 3 years ago
- Binary Python bindings for poppler utils for content extraction☆42May 12, 2021Updated 4 years ago
- A library for extracting tables from PDF files☆89Sep 27, 2013Updated 12 years ago
- A glossary for the United States.☆42Apr 30, 2015Updated 10 years ago
- Responsively embed DocumentCloud pages.☆22Jul 5, 2018Updated 7 years ago
- A knowledge base construction engine for richly formatted data☆412Jun 23, 2021Updated 4 years ago
- Extract tables from PDF pages.☆298Jun 25, 2020Updated 5 years ago
- Evaluation Tool for the ICDAR 2019 Competition on Table Detection and Recognition☆42May 8, 2022Updated 3 years ago
- Conversions between various OCR formats☆82May 13, 2023Updated 2 years ago
- This is a list of various datasets that are collected by States initially and then provided to federal agencies.☆20Dec 17, 2021Updated 4 years ago
- How Quartz used AI to help reporters search the Mauritius Leaks☆48Aug 13, 2019Updated 6 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,808Updated this week