TeamHG-Memex/arachnado

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TeamHG-Memex/arachnado)

TeamHG-Memex / arachnado

Web Crawling UI and HTTP API, based on Scrapy and Tornado

☆162

Alternatives and similar repositories for arachnado

Users that are interested in arachnado are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
mikej165 / scrapy-web-ui
View on GitHub
scrapy-ui
☆16Feb 21, 2014Updated 12 years ago
TeamHG-Memex / autologin
View on GitHub
A project to attempt to automatically login to a website given a single seed
☆129Apr 8, 2026Updated 3 months ago
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
TeamHG-Memex / autologin-middleware
View on GitHub
Scrapy middleware for the autologin
☆36Apr 8, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
scrapinghub / scrapyrt
View on GitHub
HTTP API for Scrapy spiders
☆882Jun 29, 2026Updated 3 weeks ago
scrapinghub / scrapy-mosquitera
View on GitHub
Restrict crawl and scraping scope using matchers.
☆26Jun 8, 2016Updated 10 years ago
TeamHG-Memex / undercrawler
View on GitHub
A generic crawler
☆81Apr 8, 2026Updated 3 months ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
rmax / scrapydo
View on GitHub
Crochet-based blocking API for Scrapy.
☆47Feb 24, 2017Updated 9 years ago
TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
scrapinghub / frontera
View on GitHub
A scalable frontier for web crawlers
☆1,332Jun 6, 2025Updated last year
scrapy / pypydispatcher
View on GitHub
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
☆16Jul 3, 2017Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TeamHG-Memex / docker-tor-rotator
View on GitHub
A rotating socks proxy using Tor, Delegate and Haproxy
☆14Apr 8, 2026Updated 3 months ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
istresearch / scrapy-cluster
View on GitHub
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
☆1,225Nov 7, 2023Updated 2 years ago
scrapy-plugins / scrapy-deltafetch
View on GitHub
Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
☆276Feb 26, 2025Updated last year
nasa-jpl-memex / memex-gate
View on GitHub
General Architecture for Text Engineering
☆50Mar 23, 2016Updated 10 years ago
mitll / topic-clustering
View on GitHub
☆44Jan 15, 2016Updated 10 years ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
Sotera / DatawakeDepot
View on GitHub
Loopback web application for administration of Datawake networks
☆10May 2, 2017Updated 9 years ago
TeamHG-Memex / imageSimilarity
View on GitHub
Given a new image, determine if it is likely derived from a known image.
☆21Apr 8, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆107Apr 8, 2026Updated 3 months ago
nasa-jpl-memex / memex-explorer
View on GitHub
Viewers for statistics and dashboarding of Domain Search Engine data
☆128Jan 19, 2016Updated 10 years ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆65Apr 8, 2026Updated 3 months ago
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago
mitll / MITIE
View on GitHub
MITIE: library and tools for information extraction
☆29Jan 22, 2015Updated 11 years ago
xiaodaguan / sogou_weixin
View on GitHub
weixin.sogou.com 微信爬虫 -- 基于scrapy
☆29Dec 8, 2016Updated 9 years ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
nasa-jpl-memex / topic_space
View on GitHub
Topic modeling web application
☆40Jul 23, 2015Updated 10 years ago
pydepta / pydepta
View on GitHub
A python implementation of DEPTA
☆84Jan 14, 2017Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ericwhyne / open-catalog-generator
View on GitHub
Code and templates required to build the DARPA open catalog.
☆18Mar 23, 2016Updated 10 years ago
nik0spapp / sdalg
View on GitHub
Web page segmentation and noise removal
☆55Feb 4, 2024Updated 2 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
autonlab / tad
View on GitHub
Temporal Anomaly Detector (TAD)
☆16Nov 2, 2017Updated 8 years ago
stummjr / scrapy-fieldstats
View on GitHub
A Scrapy extension to log items coverage when the spider shuts down
☆18Apr 11, 2020Updated 6 years ago
rafaelcapucho / scrapy-eagle
View on GitHub
Scrapy Eagle is a tool that allow us to run any Scrapy based project in a distributed fashion and monitor how it is going on and how many…
☆24Sep 4, 2020Updated 5 years ago
roycehaynes / scrapy-rabbitmq
View on GitHub
A RabbitMQ Scheduler for Scrapy
☆87Aug 9, 2022Updated 3 years ago