timClicks/slate

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/timClicks/slate)

timClicks / slate

The simplest way to extract text from PDFs in Python

☆427

Alternatives and similar repositories for slate

Users that are interested in slate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jcushman / pdfquery
View on GitHub
A fast and friendly PDF scraping library.
☆781Oct 17, 2023Updated 2 years ago
ecatkins / xpdf_python
View on GitHub
Python wrapper for xpdf
☆19Nov 28, 2019Updated 6 years ago
euske / pdfminer
View on GitHub
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
☆5,283Dec 7, 2022Updated 3 years ago
18F / doc_processing_toolkit
View on GitHub
Python library to extract text from PDF, and default to OCR when text extraction fails.
☆62Oct 6, 2017Updated 8 years ago
drj11 / pdftables
View on GitHub
A library for extracting tables from PDF files
☆93Aug 2, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
miohtama / pdf-to-html
View on GitHub
PDF to JPEG images + HTML with <img> alt text converter
☆49May 28, 2014Updated 12 years ago
mgorkove / pdfToTxt
View on GitHub
Command line interface to convert multiple PDFs to text files. Uses pdfminer.
☆13Nov 22, 2018Updated 7 years ago
py-pdf / pypdf
View on GitHub
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
☆10,121Jun 30, 2026Updated 3 weeks ago
deanmalmgren / textract
View on GitHub
extract text from any document. no muss. no fuss.
☆4,670Jul 11, 2026Updated last week
angeloskath / Pdf-to-text-via-PHP
View on GitHub
Collection of classes that parse pdf files with the main purpose of converting a pdf to plain text
☆22Dec 5, 2011Updated 14 years ago
pdfminer / pdfminer.six
View on GitHub
Community maintained fork of pdfminer - we fathom PDF
☆7,002Mar 13, 2026Updated 4 months ago
syllabs / pdf2text
View on GitHub
A PDFMiner wrapper to ease the text extraction from pdf files.
☆24Apr 25, 2013Updated 13 years ago
pmaupin / pdfrw
View on GitHub
pdfrw is a pure Python library that reads and writes PDFs
☆1,908Apr 29, 2024Updated 2 years ago
SwoopSearch / pyaddress
View on GitHub
pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…
☆100Sep 16, 2019Updated 6 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
chezou / tabula-py
View on GitHub
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
☆2,315Dec 5, 2024Updated last year
ashima / pdf-table-extract
View on GitHub
Extract tables from PDF pages.
☆300Jun 25, 2020Updated 6 years ago
chrismattmann / tika-python
View on GitHub
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
☆1,661Jul 1, 2026Updated 2 weeks ago
seattletimes / police-killings
View on GitHub
A Seattle Times investigation on Washington's "evil intent" laws
☆20Sep 28, 2015Updated 10 years ago
nelson-liu / lexical-semantic-recognition
View on GitHub
☆18Jun 12, 2023Updated 3 years ago
hubgit / md-ld
View on GitHub
Markdown for Linked Data
☆17Apr 4, 2015Updated 11 years ago
metachris / pdfx
View on GitHub
Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.
☆1,077Jun 15, 2023Updated 3 years ago
alexey-osipenko / giza-pp
View on GitHub
Giza++
☆12May 12, 2015Updated 11 years ago
rileyedmunds / complexcnn
View on GitHub
research on convolutional neural networks in fourier space
☆15Feb 6, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stefanw / scrapa
View on GitHub
Python 3 AsyncIO powered scraping framework with batteries included
☆20Sep 8, 2016Updated 9 years ago
nikhilNathwani / NBA_cron
View on GitHub
Task scheduler that automatically runs scripts after each NBA game
☆12Feb 9, 2015Updated 11 years ago
teralytics / flowmap.query
View on GitHub
An exploratory visualization tool for the analysis of movements between geographic locations
☆13Dec 9, 2022Updated 3 years ago
commonsense / conceptdb
View on GitHub
A platform for storing large semantic networks on MongoDB
☆22Jun 20, 2011Updated 15 years ago
ahawker / scratchdir
View on GitHub
Context manager to maintain your temporary directories/files.
☆17Jan 23, 2023Updated 3 years ago
amandabee / CUNY-data-skills
View on GitHub
This semester we will work together to gather, analyze and visualize numbers you need to understand your audience and to tell interactive…
☆17Oct 5, 2018Updated 7 years ago
pnpnpn / street-address
View on GitHub
Street address parser and formatter
☆91Sep 12, 2019Updated 6 years ago
gjreda / pydata2015sea
View on GitHub
Materials for my PyData Seattle talk
☆21Aug 6, 2015Updated 10 years ago
blakev / gevent-tasks
View on GitHub
Task manager built around the gevent green threads library.
☆17Feb 3, 2019Updated 7 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
fbkarsdorp / twitter-workshop
View on GitHub
Workshop materials for scraping Twitter with Python
☆13May 25, 2016Updated 10 years ago
aluarosi / congreso
View on GitHub
☆11May 25, 2015Updated 11 years ago
alexandrevicenzi / fluentmail
View on GitHub
Python SMTP client and Email for Humans™
☆81Dec 4, 2018Updated 7 years ago
jazzband / tablib
View on GitHub
Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
☆4,756Updated this week
tamirhassan / pdfxtk
View on GitHub
PDF Extraction Toolkit
☆43Nov 23, 2020Updated 5 years ago
benjamincohen1 / SpamClassifierOfficeHours
View on GitHub
The code from my codementor office hours on an introduction to machine learning and natural language processing
☆18Feb 5, 2015Updated 11 years ago
deadlyforcedb / data-recipes
View on GitHub
A small repo of notes and scripts for collecting data on U.S. deadly force police incidents
☆10Aug 9, 2015Updated 10 years ago