TeamHG-Memex / aquariumView external linksLinks
Splash + HAProxy + Docker Compose
☆195Updated this week
Alternatives and similar repositories for aquarium
Users that are interested in aquarium are comparing it to the libraries listed below
Sorting:
- A generic crawler☆78Updated this week
- A component that tries to avoid downloading duplicate content☆27Updated this week
- Lightweight, scriptable browser as a service with an HTTP API☆4,199Aug 2, 2024Updated last year
- extract difference between two html pages☆32Updated this week
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated this week
- Restrict crawl and scraping scope using matchers.☆26Jun 8, 2016Updated 9 years ago
- Scrapy middleware for the autologin☆36Updated this week
- Extract text from HTML☆134Updated this week
- Paginating the web☆37Feb 11, 2014Updated 12 years ago
- A project to attempt to automatically login to a website given a single seed☆128Updated this week
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- Detect and classify pagination links☆105Updated this week
- HTTP API for Scrapy spiders☆879Updated this week
- ☆16Apr 24, 2024Updated last year
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated this week
- A scalable frontier for web crawlers☆1,325Jun 6, 2025Updated 8 months ago
- A complimentary proxy to help to use SPM with headless browsers☆108May 29, 2023Updated 2 years ago
- A middleware layer for Scrapy that detects CAPTCHA tests and solves them☆44Jul 6, 2023Updated 2 years ago
- use multiple proxies with Scrapy☆772Updated this week
- Spider templates for automatic crawlers.☆34Jan 8, 2026Updated last month
- Broad crawler for domain discovery☆19Updated this week
- Python client for Zyte API☆28Updated this week
- Analyze scraped data☆46Dec 9, 2019Updated 6 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆35Mar 6, 2017Updated 8 years ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆11Updated this week
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,230Nov 7, 2023Updated 2 years ago
- Use pyppeteer from a Scrapy spider☆59Feb 5, 2020Updated 6 years ago
- Random User-Agent middleware based on fake-useragent☆690Sep 18, 2023Updated 2 years ago
- Extract embedded metadata from HTML markup☆946Oct 1, 2025Updated 4 months ago
- Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup☆21Sep 26, 2016Updated 9 years ago
- Simple heuristic for measuring web page similarity (& data set)☆90Updated this week
- Docker container running scrapyd with HTTP authentication☆41May 14, 2024Updated last year
- A python library detect and extract listing data from HTML page.☆108May 5, 2017Updated 8 years ago
- C++ library to parse WARC files☆11Jan 27, 2019Updated 7 years ago
- Celery plugin to autoscale based on available CPU, memory, or other system attributes.☆11Dec 8, 2017Updated 8 years ago
- 🕷Configuration based html scraper☆23Nov 4, 2025Updated 3 months ago
- Price and currency parsing utility☆27Mar 6, 2023Updated 2 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Aug 13, 2025Updated 6 months ago