Splash + HAProxy + Docker Compose
☆195Feb 10, 2026Updated 3 weeks ago
Alternatives and similar repositories for aquarium
Users that are interested in aquarium are comparing it to the libraries listed below
Sorting:
- A generic crawler☆79Feb 10, 2026Updated 3 weeks ago
- A classifier for detecting soft 404 pages☆58Feb 10, 2026Updated 3 weeks ago
- A component that tries to avoid downloading duplicate content☆28Feb 10, 2026Updated 3 weeks ago
- extract difference between two html pages☆33Feb 10, 2026Updated 3 weeks ago
- Scrapy+Splash for JavaScript integration☆3,239Feb 11, 2025Updated last year
- Restrict crawl and scraping scope using matchers.☆26Jun 8, 2016Updated 9 years ago
- Scrapy middleware for the autologin☆37Feb 10, 2026Updated 3 weeks ago
- Extract text from HTML☆134Feb 10, 2026Updated 3 weeks ago
- Paginating the web☆37Feb 11, 2014Updated 12 years ago
- A project to attempt to automatically login to a website given a single seed☆129Feb 23, 2026Updated 2 weeks ago
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- Web Crawling UI and HTTP API, based on Scrapy and Tornado☆160Feb 10, 2026Updated 3 weeks ago
- Detect and classify pagination links☆105Feb 10, 2026Updated 3 weeks ago
- HTTP API for Scrapy spiders☆881Feb 16, 2026Updated 3 weeks ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- ☆16Apr 24, 2024Updated last year
- A scalable frontier for web crawlers☆1,330Jun 6, 2025Updated 9 months ago
- ☆13Dec 4, 2019Updated 6 years ago
- This is the facade for installation and access to the individual components☆15Feb 10, 2026Updated 3 weeks ago
- A middleware layer for Scrapy that detects CAPTCHA tests and solves them☆44Jul 6, 2023Updated 2 years ago
- A decorator to write coroutine-like spider callbacks.☆109Dec 26, 2022Updated 3 years ago
- use multiple proxies with Scrapy☆773Feb 10, 2026Updated 3 weeks ago
- Analyze scraped data☆46Dec 9, 2019Updated 6 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆12Feb 23, 2026Updated 2 weeks ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,229Nov 7, 2023Updated 2 years ago
- Use pyppeteer from a Scrapy spider☆59Feb 5, 2020Updated 6 years ago
- Random User-Agent middleware based on fake-useragent☆689Sep 18, 2023Updated 2 years ago
- Library for annotation-based dependency injection☆24Updated this week
- Extract embedded metadata from HTML markup☆951Oct 1, 2025Updated 5 months ago
- Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup☆21Sep 26, 2016Updated 9 years ago
- Simple heuristic for measuring web page similarity (& data set)☆90Feb 23, 2026Updated 2 weeks ago
- Docker container running scrapyd with HTTP authentication☆41May 14, 2024Updated last year
- A python library detect and extract listing data from HTML page.☆108May 5, 2017Updated 8 years ago
- C++ library to parse WARC files☆11Jan 27, 2019Updated 7 years ago
- Celery plugin to autoscale based on available CPU, memory, or other system attributes.☆11Dec 8, 2017Updated 8 years ago
- Price and currency parsing utility☆27Mar 6, 2023Updated 3 years ago
- 🕷Configuration based html scraper☆23Nov 4, 2025Updated 4 months ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Aug 13, 2025Updated 6 months ago
- Scrapy integration with Tor for anonymous web scraping☆46Nov 17, 2015Updated 10 years ago