timClicks / slate
The simplest way to extract text from PDFs in Python
☆425Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for slate
- A fast and friendly PDF scraping library.☆773Updated last year
- Web Content Retrieval for Humans™☆612Updated 2 years ago
- A simple Python module for parsing human names into their individual components☆657Updated 5 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- Python module to drive the awesome pdftk binary.☆147Updated last year
- Extract tables from PDF pages.☆276Updated 4 years ago
- pdfrw is a pure Python library that reads and writes PDFs☆1,868Updated 6 months ago
- Python library of web-related functions☆392Updated 3 weeks ago
- A tiny library for Python text normalisation. Useful for ad-hoc text processing.☆144Updated 10 months ago
- A library for extracting tables from PDF files☆87Updated 4 years ago
- Reads, queries and modifies Microsoft Word 2007/2008 docx files.☆1,072Updated 9 years ago
- Python interface to the Stanford Named Entity Recognizer☆293Updated 3 years ago
- extract text from any document. no muss. no fuss.☆3,905Updated this week
- Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py☆390Updated last year
- Converts XML to Python objects☆613Updated 9 months ago
- Python address detector and parser☆200Updated 10 months ago
- Python script to do PDF OCR conversion using Tesseract☆372Updated last year
- Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .☆618Updated 7 years ago
- Twitter text processing library (auto linking and extraction of usernames, lists and hashtags).☆180Updated 5 years ago
- Find dates inside text using Python and get back datetime objects☆635Updated 5 months ago
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,252Updated last year
- Extract countries, regions and cities from a URL or text☆220Updated 4 years ago
- A Python library for OAuth 1.0/a, 2.0, and Ofly.☆1,603Updated 2 years ago
- Easier wrangling of web data.☆259Updated 6 years ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆930Updated 6 years ago
- A collection of common regular expressions bundled with an easy to use interface.☆1,570Updated last year
- a python library for parsing unstructured western names into name components.☆593Updated last week
- Train NLTK objects with zero code☆747Updated 4 years ago
- Python interface to Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages☆539Updated 3 years ago
- A library for extracting tables from PDF files☆90Updated 11 years ago