euske/pdfminer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/euske/pdfminer)

euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

☆5,283

Alternatives and similar repositories for pdfminer

Users that are interested in pdfminer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,001Mar 13, 2026Updated 4 months ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 2 weeks ago
pmaupin / pdfrw
View on GitHub
pdfrw is a pure Python library that reads and writes PDFs
☆1,908Apr 29, 2024Updated 2 years ago
timClicks / slate
View on GitHub
The simplest way to extract text from PDFs in Python
☆427Jul 7, 2022Updated 4 years ago
jsvine / pdfplumber
View on GitHub
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆10,565Jun 17, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dpapathanasiou / pdfminer-layout-scanner
View on GitHub
A more complete example of programming with PDFMiner, which continues where the default documentation stops
☆216Dec 3, 2019Updated 6 years ago
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,669Jul 11, 2026Updated last week
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
coolwanglu / pdf2htmlEX
View on GitHub
Convert PDF to HTML without losing text or format.
☆10,606Jun 2, 2023Updated 3 years ago
tabulapdf / tabula
View on GitHub
Tabula is a tool for liberating data tables trapped inside PDF files
☆7,446Mar 14, 2025Updated last year
atlanhq / camelot
View on GitHub
Camelot: PDF Table Extraction for Humans
☆3,716Jan 5, 2023Updated 3 years ago
seatgeek / fuzzywuzzy
View on GitHub
Fuzzy String Matching in Python
☆9,262Feb 24, 2023Updated 3 years ago
WZBSocialScienceCenter / pdftabextract
View on GitHub
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
☆2,255Jun 24, 2022Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
piskvorky / gensim
View on GitHub
Topic Modelling for Humans
☆16,464Nov 1, 2025Updated 8 months ago
python-openxml / python-docx
View on GitHub
Create and modify Word documents with Python
☆5,681Jun 17, 2025Updated last year
explosion / spaCy
View on GitHub
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆33,756May 19, 2026Updated 2 months ago
clips / pattern
View on GitHub
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
☆8,857Jun 10, 2024Updated 2 years ago
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,258Updated this week
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,109Jul 8, 2026Updated last week
tesseract-ocr / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆75,444Updated this week
facebookresearch / fastText
View on GitHub
Library for fast text representation and classification.
☆26,548Mar 22, 2024Updated 2 years ago
scrapy / scrapy
View on GitHub
Scrapy, a fast high-level web crawling & scraping framework for Python.
☆63,234Jul 13, 2026Updated last week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
google / python-fire
View on GitHub
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
☆28,220Jul 1, 2026Updated 2 weeks ago
camelot-dev / camelot
View on GitHub
A Python library to extract tabular data from PDFs
☆3,786Updated this week
madmaze / pytesseract
View on GitHub
A Python wrapper for Google Tesseract
☆6,371Jul 13, 2026Updated last week
sloria / TextBlob
View on GitHub
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
☆9,541Updated this week
nltk / nltk
View on GitHub
NLTK Source
☆14,678Jul 8, 2026Updated last week
tqdm / tqdm
View on GitHub
A Fast, Extensible Progress Bar for Python and CLI
☆31,243Updated this week
chrismattmann / tika-python
View on GitHub
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,661Jul 1, 2026Updated 2 weeks ago
keras-team / keras
View on GitHub
Deep Learning for humans
☆64,167Updated this week
psf / requests
View on GitHub
A simple, yet elegant, HTTP library.
☆54,136Jul 9, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
bokeh / bokeh
View on GitHub
Interactive Data Visualization in the browser, from Python
☆20,416Updated this week
jazzband / tablib
View on GitHub
Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
☆4,756Updated this week
celery / celery
View on GitHub
Distributed Task Queue (development branch)
☆28,699Updated this week
coleifer / peewee
View on GitHub
a small, expressive orm -- supports postgresql, mysql, sqlite, now with asyncio
☆11,981Updated this week
psf / requests-html
View on GitHub
Pythonic HTML Parsing for Humans™
☆13,827Apr 16, 2024Updated 2 years ago
pypa / pipenv
View on GitHub
Python Development Workflow for Humans.
☆25,051Updated this week
sanic-org / sanic
View on GitHub
Accelerate your web app development | Build fast. Run fast.
☆18,633Updated this week