pypdfium2-team / pypdfium2Links
Python bindings to PDFium, reasonably cross-platform.
☆668Updated this week
Alternatives and similar repositories for pypdfium2
Users that are interested in pypdfium2 are comparing it to the libraries listed below
Sorting:
- Benchmarking PDF libraries☆315Updated 4 months ago
- Extract structured text from pdfs quickly☆620Updated 5 months ago
- Demos, examples and utilities using PyMuPDF☆687Updated last year
- A Python tool to help extracting information from structured PDFs.☆422Updated this week
- PyMuPDF4LLM☆1,113Updated this week
- img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing☆809Updated last week
- UniTable: Towards a Unified Table Foundation Model☆512Updated last year
- 📚 Process PDFs, Word documents and more with spaCy☆800Updated 8 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆193Updated last week
- Python binding to Poppler-cpp pdf library☆113Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆418Updated 2 weeks ago
- The scripts for training Detectron2-based Layout Models on popular layout analysis datasets☆215Updated 2 years ago
- ☆197Updated last week
- Simple package to extract text with coordinates from programmatic PDFs☆213Updated last week
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,901Updated last year
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆395Updated 2 years ago
- A python library to define and validate data types in Docling.☆201Updated this week
- Adobe PDFServices python SDK Samples☆160Updated 3 months ago
- Streamlit PDF viewer☆187Updated this week
- Software that makes labeling PDFs easy.☆421Updated last year
- ☆164Updated 2 weeks ago
- ⚡️ 80x faster Fasttext language detection out of the box | Split text by language☆254Updated last month
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆223Updated 3 weeks ago
- Document Layout Analysis☆391Updated last week
- pgvector support for Python☆1,371Updated last month
- Convert Word documents (.docx files) to HTML☆1,020Updated last month
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,761Updated 6 months ago
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆36Updated 9 months ago
- Python bindings for Tantivy☆369Updated this week
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆329Updated 2 years ago