kata198 / AdvancedHTMLParser
Fast Indexed python HTML parser which builds a DOM node tree, providing common getElementsBy* functions for scraping, testing, modification, and formatting. Also XPath.
☆102Updated last year
Alternatives and similar repositories for AdvancedHTMLParser:
Users that are interested in AdvancedHTMLParser are comparing it to the libraries listed below
- CSS Selectors for Python☆293Updated this week
- CSS related utilities (parsing, serialization, etc) for python☆31Updated 6 months ago
- Embed the Duktape JS interpreter in Python☆81Updated last year
- Web technology based GUI library for desktop applications☆75Updated 6 years ago
- Scriptable Google Chrome™ as a HTTP service + asyncio driver☆119Updated last year
- A Python library for extracting titles, images, descriptions and canonical urls from HTML.☆148Updated 4 years ago
- Python to JavaScript translator☆92Updated 8 years ago
- One interface to read and write the data in various excel formats, import the data into and export the data from databases☆59Updated this week
- PyQuery-based scraping micro-framework.☆116Updated 3 years ago
- Pyppeteer integration for Scrapy☆59Updated 4 years ago
- A decorator to write coroutine-like spider callbacks.☆110Updated 2 years ago
- Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.☆52Updated 2 months ago
- A wrapper library to read, manipulate and write data in xlsx and xlsm format using openpyxl☆118Updated 2 weeks ago
- 📚 Ordered Multivalue Dictionary. Powers furl.☆68Updated 3 years ago
- A cross platform clipboard operation library of Python. Works for Windows, Mac and Linux.☆87Updated last year
- Python bindings to libmagic☆34Updated last year
- Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python☆275Updated last week
- Common interface for data container classes☆67Updated last month
- minimalist event system for Python☆85Updated 5 years ago
- Crochet-based blocking API for Scrapy.☆46Updated 8 years ago
- Python powered spreadsheets☆173Updated 6 years ago
- IO of git-style object databases☆221Updated last month
- yet easy url☆22Updated 3 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 10 months ago
- A fork of http://pydispatcher.sourceforge.net/ with PyPy support☆16Updated 7 years ago
- Better interface for WebDriver (Selenium 2).☆58Updated 4 years ago
- Scrapy spider middleware to split an item into multiple items using a multi-valued key☆20Updated 8 years ago
- Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.☆56Updated 3 years ago
- Scrapy schema validation pipeline and Item builder using JSON Schema☆45Updated 3 years ago
- A very intuitive and useful adapter to libarchive for universal archive access.☆98Updated 4 years ago