py-pdf / awesome-pdf
A curated list of resources around PDF files
☆127Updated 8 months ago
Alternatives and similar repositories for awesome-pdf:
Users that are interested in awesome-pdf are comparing it to the libraries listed below
- Benchmarking PDF libraries☆271Updated last year
- A Python tool to help extracting information from structured PDFs.☆402Updated 2 weeks ago
- Adobe PDFServices python SDK Samples☆148Updated 5 months ago
- Python API for PDF documents☆119Updated 7 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆178Updated last week
- CLI tool to extract (meta)data from PDF and manipulate PDF files☆140Updated 3 weeks ago
- Python binding to Poppler-cpp pdf library☆109Updated 7 months ago
- Demos, examples and utilities using PyMuPDF☆646Updated 9 months ago
- Simplify DOCX files to JSON☆232Updated 6 months ago
- A Python library to extract tabular data from PDFs☆66Updated last week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 8 months ago
- Python library to extract tabular data from images and scanned PDFs☆277Updated 8 months ago
- Append/Concatenate .docx documents☆107Updated 8 months ago
- Pandoc (Python Library)☆154Updated 7 months ago
- PDF to XML ALTO file converter☆236Updated this week
- Pure-python library for adding annotations to PDFs☆201Updated 4 years ago
- Create customizable PowerPoint Presentation (.pptx) using a predefined layout template☆34Updated 4 years ago
- a CLI tool for automatically generating bookmarks for PDF documents (i.e. scouting the PDF document for you)☆16Updated last year
- A post-processing tool for scanned sheets of paper.☆81Updated last year
- Extract structured text from pdfs quickly☆464Updated last month
- An index of PDF-centric corpora☆127Updated 3 weeks ago
- Viewer for the structure extracted by Grobid on PDF documents☆48Updated 2 months ago
- Logical structure analysis for visually structured documents☆89Updated 2 years ago
- Extract tables from scanned documents pdf into csv file using ocr and image processing☆132Updated 6 years ago
- Docx tracked change redlines for the Python ecosystem.☆60Updated 9 months ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆148Updated last year
- 💡✏️️ ⬇️️ JSON to Markdown converter - Generate Markdown from format independent JSON☆71Updated 6 years ago
- Repository for deepdoctection tutorial notebooks☆43Updated 4 months ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆210Updated last year
- Open Access PDF harvester, metadata aggregator and full-text ingester☆60Updated 11 months ago