timClicks / slate
The simplest way to extract text from PDFs in Python
☆426Updated 2 years ago
Alternatives and similar repositories for slate:
Users that are interested in slate are comparing it to the libraries listed below
- A fast and friendly PDF scraping library.☆772Updated last year
- extract text from any document. no muss. no fuss.☆3,970Updated 2 months ago
- Extract countries, regions and cities from a URL or text☆218Updated 4 years ago
- A simple Python module for parsing human names into their individual components☆668Updated 8 months ago
- Web Content Retrieval for Humans™☆616Updated 2 years ago
- Python charting for 80% of humans.☆333Updated last week
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,281Updated 2 years ago
- Extract tables from PDF pages.☆283Updated 4 years ago
- A Python toolkit for processing tabular data☆418Updated 6 months ago
- Python library of web-related functions☆395Updated this week
- Find dates inside text using Python and get back datetime objects☆641Updated 9 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆214Updated 5 years ago
- Python script to do PDF OCR conversion using Tesseract☆373Updated last year
- Easier wrangling of web data.☆259Updated 6 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated last year
- Train NLTK objects with zero code☆745Updated 4 years ago
- A collection of common regular expressions bundled with an easy to use interface.☆1,569Updated last year
- Python module to drive the awesome pdftk binary.☆148Updated last year
- A python script for summarizing articles using nltk☆544Updated 8 years ago
- Python2's stdlib csv module is nice, but it doesn't support unicode. This module is a drop-in replacement which *does*. If you prefer p…☆594Updated this week
- Simple, Pythonic extraction of text, shapes and images from PDFs☆79Updated 4 years ago
- Python interface to the Stanford Named Entity Recognizer☆291Updated 3 years ago
- A Flask extension to access, upload, download, save and delete files on cloud storage providers such as: AWS S3, Google Storage, Microsof…☆246Updated 5 years ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆929Updated 6 years ago
- Heuristic based boilerplate removal tool☆747Updated 9 months ago
- a python library for parsing unstructured western names into name components.☆599Updated 3 months ago
- A Google Charts API for Python, meant to be used as an alternative to matplotlib.☆205Updated 7 years ago
- "Scrape Easy" - an extension of the Scrapy framework.☆188Updated 8 years ago
- Python address detector and parser☆206Updated last year
- Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)☆204Updated 9 months ago