kata198 / AdvancedHTMLParser
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
☆100Updated last year
Related projects ⓘ
Alternatives and complementary repositories for AdvancedHTMLParser
- CSS Selectors for Python☆291Updated last month
- A decorator to write coroutine-like spider callbacks.☆110Updated last year
- A Python library for extracting titles, images, descriptions and canonical urls from HTML.☆148Updated 4 years ago
- automatic persistence for Python objects☆46Updated 2 weeks ago
- Python Abstract Syntax Tree viewer in Qt☆104Updated last year
- Scrapinghub Command Line Client☆125Updated 6 months ago
- Python bindings to the Brotli compression library☆148Updated 2 months ago
- Library to populate items using XPath and CSS with a convenient API☆45Updated last month
- URL normalization for Python☆94Updated 2 years ago
- Common interface for data container classes☆62Updated last month
- Modern robots.txt Parser for Python☆189Updated 10 months ago
- Simple Web UI for Scrapy spider management via Scrapyd☆50Updated 6 years ago
- Asyncio web crawling framework. Work in progress.☆18Updated 3 months ago
- A pure-Python robots.txt parser with support for modern conventions.☆55Updated this week
- Embed the Duktape JS interpreter in Python☆81Updated last year
- IO of git-style object databases☆220Updated last month
- Web technology based GUI library for desktop applications☆74Updated 6 years ago
- Python library for extracting text from various file formats (for indexing).☆111Updated 2 years ago
- A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.☆106Updated 6 months ago
- Extension to ast that allow ast -> python code generation.☆76Updated 6 years ago
- ☆29Updated 3 years ago
- Pure-Python HTTP/2 header encoding☆73Updated this week
- Extract text from HTML☆131Updated 4 years ago
- Python Regular Expressions for Humans™.☆231Updated 6 years ago
- A simple, immutable URL class with a clean API for interrogation and manipulation.☆294Updated last year
- A handy Python library to validate, manipulate and generate strings☆57Updated last year
- Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application☆22Updated 4 years ago
- Crochet: use Twisted anywhere!☆236Updated 2 months ago
- A modern CSS selector implementation for BeautifulSoup☆206Updated last month
- Use pyppeteer from a Scrapy spider☆60Updated 4 years ago