Site Hound (previously THH) is a Domain Discovery Tool
☆24Feb 10, 2026Updated 3 weeks ago
Alternatives and similar repositories for sitehound-frontend
Users that are interested in sitehound-frontend are comparing it to the libraries listed below
Sorting:
- extract difference between two html pages☆33Feb 10, 2026Updated 3 weeks ago
- a tor socks proxy docker image☆12Feb 10, 2026Updated 3 weeks ago
- Show summary of a large number of URLs in a Jupyter Notebook☆17Feb 10, 2026Updated 3 weeks ago
- A classifier for detecting soft 404 pages☆58Feb 10, 2026Updated 3 weeks ago
- A component that tries to avoid downloading duplicate content☆28Feb 10, 2026Updated 3 weeks ago
- Adaptive crawler which uses Reinforcement Learning methods☆169Feb 10, 2026Updated 3 weeks ago
- [UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.☆12Feb 23, 2026Updated 2 weeks ago
- A generic crawler☆79Feb 10, 2026Updated 3 weeks ago
- Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…☆55May 21, 2024Updated last year
- Scrapy middleware for the autologin☆37Feb 10, 2026Updated 3 weeks ago
- Broad crawler for domain discovery☆20Feb 10, 2026Updated 3 weeks ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Aug 13, 2025Updated 6 months ago
- Extract text from HTML☆134Feb 10, 2026Updated 3 weeks ago
- Web Crawling UI and HTTP API, based on Scrapy and Tornado☆160Feb 10, 2026Updated 3 weeks ago
- A curated list of amazingly libraries, services and resources to work with PDF files☆16Jan 28, 2026Updated last month
- Simple heuristic for measuring web page similarity (& data set)☆90Feb 23, 2026Updated 2 weeks ago
- Highlight and select phrases in HTML pages.☆24Nov 4, 2019Updated 6 years ago
- A scrapy extension to store requests and responses information in storage service☆27Mar 11, 2022Updated 3 years ago
- Detect and classify pagination links☆105Feb 10, 2026Updated 3 weeks ago
- Paginating the web☆37Feb 11, 2014Updated 12 years ago
- The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!☆41May 29, 2017Updated 8 years ago
- Scrapy middleware which allows to crawl only new content☆79Feb 10, 2026Updated 3 weeks ago
- CVE-2014-0160 (Heartbeat Buffer over-read bug)☆15May 3, 2014Updated 11 years ago
- A pytest plugin to run Xvfb (or Xephyr/Xvnc) for tests.☆76Nov 24, 2025Updated 3 months ago
- Sort-friendly URI Reordering Transform (SURT) python module☆45Sep 11, 2025Updated 5 months ago
- Scrapes a given Facebook user's feed for messages, tags, likes, and datetimes of submissions.☆10Jul 3, 2013Updated 12 years ago
- Foundation for building custom subscription applications w/ BigCommerce☆10Dec 16, 2025Updated 2 months ago
- Schematics & Firmware for self-tuning portable RF Jammer☆11Feb 2, 2018Updated 8 years ago
- A collection of SolarWinds SWQL examples☆10Mar 18, 2021Updated 4 years ago
- Find your router's default password☆14Apr 7, 2015Updated 10 years ago
- 通过图数据库neo4j和ChatGPT的联动合作,实现将自然语言的医疗知识材料形成知识图谱☆10May 23, 2025Updated 9 months ago
- A proof of concept for Joomla's CVE-2015-8562 vulnerability (Object Injection RCE)☆10May 3, 2024Updated last year
- Content classification/clustering through language processing☆25Mar 10, 2012Updated 13 years ago
- Minimal web-based client for NewsBlur.☆20Dec 7, 2014Updated 11 years ago
- Vendont is a Venmo transaction finder/scraper. It uses Venmo's own public API system to fetch all transactions at a given time.☆10Jun 16, 2019Updated 6 years ago
- A simple maintenance tracking tool for your vehicles.☆12Nov 1, 2025Updated 4 months ago
- BlockCAT token sale smart contracts.☆11Oct 19, 2017Updated 8 years ago
- Docker Image packaging for Pentaho BI Server☆10Jul 6, 2015Updated 10 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆48Mar 19, 2018Updated 7 years ago