The simplest way to extract text from PDFs in Python
☆428Jul 7, 2022Updated 3 years ago
Alternatives and similar repositories for slate
Users that are interested in slate are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A fast and friendly PDF scraping library.☆781Oct 17, 2023Updated 2 years ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,286Dec 7, 2022Updated 3 years ago
- A library for extracting tables from PDF files☆93Aug 2, 2020Updated 5 years ago
- extract text from any document. no muss. no fuss.☆4,637May 7, 2026Updated last month
- PDF Parser : fork with Python 2+3 support using six☆25Dec 6, 2015Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Command line interface to convert multiple PDFs to text files. Uses pdfminer.☆13Nov 22, 2018Updated 7 years ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆10,099Updated this week
- pyaddress is an address parsing library, taking the guesswork out of using addresses in your applications. We use it as part of our apart…☆100Sep 16, 2019Updated 6 years ago
- Pure-Python PDF Library; this repository is no longer maintained, please see https://github.com/knowah/PyPDF2/ insead.☆275Aug 24, 2020Updated 5 years ago
- Circular buffer implementation in Nim☆10Apr 21, 2023Updated 3 years ago
- A slim, non-SWIG Python adapter to CTesseract (Tesseract OCR for C).☆24Apr 25, 2014Updated 12 years ago
- Community maintained fork of pdfminer - we fathom PDF☆7,000Mar 13, 2026Updated 3 months ago
- A simple, system independent infrastructure for performing web scraping. Utilizes Vagrant virtualbox interface and puppet provisioning to…☆24Jul 30, 2014Updated 11 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,908Apr 29, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Extract tables from PDF pages.☆301Jun 25, 2020Updated 6 years ago
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,315Dec 5, 2024Updated last year
- Simple Bayesian spam rating in Python that is easy to use, small, contained in a single file, and doesn't require any external modules.☆30Mar 11, 2015Updated 11 years ago
- Word Graph utility built with NLTK and TextBlob☆18Aug 16, 2013Updated 12 years ago
- Javascript to present HTML footnotes as a popover.☆44Oct 23, 2014Updated 11 years ago
- A DSL to build Lucene text queries in Python.☆38Jan 5, 2017Updated 9 years ago
- Python Client for Microsoft Project Oxford☆10Jun 7, 2016Updated 10 years ago
- Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.☆1,657Jun 10, 2026Updated 3 weeks ago
- A Seattle Times investigation on Washington's "evil intent" laws☆20Sep 28, 2015Updated 10 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Extract text, metadata and references (pdf, url, doi, arxiv) from PDF. Optionally download all referenced PDFs.☆1,075Jun 15, 2023Updated 3 years ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆216Dec 3, 2019Updated 6 years ago
- A collection of Fabric utilities largely for Django deployment.☆28Apr 15, 2013Updated 13 years ago
- A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the ne…☆34Apr 18, 2017Updated 9 years ago
- Python 3 AsyncIO powered scraping framework with batteries included☆20Sep 8, 2016Updated 9 years ago
- Slinky, a high-performance web crawler / text analytics in Python, Redis, Hadoop, R, Gephi☆40Aug 30, 2010Updated 15 years ago
- An exploratory visualization tool for the analysis of movements between geographic locations☆13Dec 9, 2022Updated 3 years ago
- Context manager to maintain your temporary directories/files.☆17Jan 23, 2023Updated 3 years ago
- Workshop materials for scraping Twitter with Python☆13May 25, 2016Updated 10 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A small repo of notes and scripts for collecting data on U.S. deadly force police incidents☆10Aug 9, 2015Updated 10 years ago
- Street address parser and formatter☆91Sep 12, 2019Updated 6 years ago
- Python notebooks analyzing campaign finance and lobbying activity data from California Secretary of State’s CAL-ACCESS database☆21Mar 3, 2018Updated 8 years ago
- THIS IS A FORK! The main repo is at the pingo-io organization☆12May 4, 2015Updated 11 years ago
- Listener for PostgreSQL notifications that dispatch via command execution☆14Sep 27, 2022Updated 3 years ago
- ☆11May 25, 2015Updated 11 years ago
- Last Writer Slicing: data provenance tracking for concurrent program debugging & analysis☆13Nov 14, 2014Updated 11 years ago