N0taN3rd / simplechrome
Webrecorders DevTools Protocol Automation Library
☆17Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for simplechrome
- Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application☆22Updated 4 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- extract difference between two html pages☆32Updated 6 years ago
- A fork of http://pydispatcher.sourceforge.net/ with PyPy support☆16Updated 7 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆14Updated 10 months ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆42Updated 6 years ago
- An index data structure for approximate string search.☆23Updated 5 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated 3 years ago
- (Archived) A Python library for record linkage and deduplication.☆19Updated 8 months ago
- Trough: Big data, small databases.☆40Updated 3 months ago
- Scrapy middleware for the autologin☆37Updated 6 years ago
- url canonicalization library for python and java☆33Updated 2 years ago
- Find which links on a web page are pagination links☆29Updated 7 years ago
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- Utility library to turn country names into ISO two-letter codes☆66Updated last month
- Extract structured data from HTML and XML documents like a boss.☆50Updated last year
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆11Updated last year
- Extract, parse and populate templates from strings☆27Updated 5 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41Updated 7 years ago
- csvcat☆22Updated 8 years ago
- Performance-focused replacement for Python urllib☆21Updated 6 years ago
- Python bindings for CLD2.☆17Updated 6 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 3 years ago
- templatemaker is a Python library that can extract data from files with a similar format, like HTML pages.☆63Updated 4 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- Faster replacement for Python's urlparse module☆46Updated 6 years ago