odie5533 / WarcMiddleware
WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆45Updated 6 years ago
Alternatives and similar repositories for WarcMiddleware:
Users that are interested in WarcMiddleware are comparing it to the libraries listed below
- Serving content from a WARC☆61Updated 12 years ago
- A clean-room clone of the Fever RSS aggregator, focusing on the API☆61Updated 2 years ago
- Bringing sanity to world of messed-up data☆65Updated 10 years ago
- Jabba's headless webkit browser for scraping AJAX-powered webpages.☆91Updated 10 years ago
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- Python library with common functionality for writing web scrapers☆102Updated 9 years ago
- A WayBack Machine Time-Lapse Generator☆29Updated 6 years ago
- Web archiving using Google Chrome☆44Updated 5 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Python script for searching through your digital books and cataloguing them in an easy-to-share list of files.☆31Updated 5 years ago
- A simple, system independent infrastructure for performing web scraping. Utilizes Vagrant virtualbox interface and puppet provisioning to…☆24Updated 10 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated 9 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application☆23Updated 4 years ago
- Scrapy middleware which allows to crawl only new content☆79Updated 2 years ago
- Save a bunch of web pages as a self-contained, compressed archive file for offline storage and sharing.☆35Updated 12 years ago
- video indexing site☆217Updated 9 years ago
- A Python client for Chrome's DevTools protocol / a headless chrome control library☆15Updated 6 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 7 months ago
- Python scripts for scraping bus ticket data from the websites of BoltBus, Greyhound, Megabus, GoBus, Amtrak, Peterpan, and EasternTravel.☆39Updated 4 years ago
- URL Transformation, Sanitization☆103Updated last year
- Simple HTTP cache for Python Requests☆98Updated 8 years ago
- Primary LocalWiki backend server environment☆48Updated 7 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 7 years ago
- A native web-based client for Slack.☆23Updated 7 years ago
- A small python script for easy access to firefox bookmarks and browsing history☆22Updated 4 years ago