py-pdf / awesome-pdfLinks
A curated list of resources around PDF files
☆139Updated last year
Alternatives and similar repositories for awesome-pdf
Users that are interested in awesome-pdf are comparing it to the libraries listed below
Sorting:
- CLI tool to extract (meta)data from PDF and manipulate PDF files☆167Updated last week
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆189Updated this week
- Demos, examples and utilities using PyMuPDF☆676Updated last year
- A Python tool to help extracting information from structured PDFs.☆411Updated last week
- Python library to extract tabular data from images and scanned PDFs☆280Updated last year
- Simplify DOCX files to JSON☆246Updated 10 months ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆153Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆395Updated last year
- PDF to XML ALTO file converter☆248Updated 2 weeks ago
- OCRmyPDF EasyOCR plugin☆89Updated 4 months ago
- Aspose.Words for Python via .NET examples and showcases☆124Updated last week
- An index of PDF-centric corpora☆136Updated last month
- Benchmarking PDF libraries☆305Updated last month
- Python bindings to PDFium, reasonably cross-platform.☆612Updated last week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆326Updated last year
- Logical structure analysis for visually structured documents☆91Updated 3 years ago
- Docx tracked change redlines for the Python ecosystem.☆78Updated last year
- Extract structured text from pdfs quickly☆576Updated 2 months ago
- Adobe PDFServices python SDK Samples☆156Updated last month
- Ergonomic line-by-line transcription of scanned text.☆53Updated 4 years ago
- NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to …☆35Updated 3 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆74Updated 3 years ago
- A collection of ORM-style clients to public patent data☆116Updated 4 months ago
- Multiple and Large PDF Documents Text Extraction.☆130Updated 6 months ago
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆83Updated 11 months ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆62Updated last year
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆70Updated this week
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.☆203Updated this week
- a utility to extract the title from a PDF file☆142Updated 6 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 5 months ago