TeamHG-Memex/extract-html-diff

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TeamHG-Memex/extract-html-diff)

TeamHG-Memex / extract-html-diff

extract difference between two html pages

☆33

Alternatives and similar repositories for extract-html-diff

Users that are interested in extract-html-diff are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
TeamHG-Memex / soft404
View on GitHub
A classifier for detecting soft 404 pages
☆65Apr 8, 2026Updated 3 months ago
TeamHG-Memex / tor-proxy
View on GitHub
a tor socks proxy docker image
☆12Apr 8, 2026Updated 3 months ago
TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TeamHG-Memex / undercrawler
View on GitHub
A generic crawler
☆81Apr 8, 2026Updated 3 months ago
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago
peterwaksman / Narwhal
View on GitHub
Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…
☆12Oct 16, 2018Updated 7 years ago
scrapinghub / webpager
View on GitHub
Paginating the web
☆37Feb 11, 2014Updated 12 years ago
TeamHG-Memex / autopager
View on GitHub
Detect and classify pagination links
☆107Apr 8, 2026Updated 3 months ago
aGHz / structominer
View on GitHub
Data scraping for a more civilized age
☆17Jun 12, 2014Updated 12 years ago
WittleWolfie / PyGram
View on GitHub
An efficient approximation for tree edit-distance.
☆45Sep 6, 2011Updated 14 years ago
rmax / databrewer
View on GitHub
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41May 29, 2017Updated 9 years ago
scrapinghub / product-extraction-benchmark
View on GitHub
☆16Apr 10, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
TeamHG-Memex / deep-deep
View on GitHub
Adaptive crawler which uses Reinforcement Learning methods
☆167Apr 8, 2026Updated 3 months ago
TeamHG-Memex / scrapy-dockerhub
View on GitHub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆12Apr 8, 2026Updated 3 months ago
redapple / parslepy
View on GitHub
Python implementation of the Parsley language for extracting structured data from web pages
☆92Oct 26, 2017Updated 8 years ago
scrapinghub / andi
View on GitHub
Library for annotation-based dependency injection
☆24Updated this week
TeamHG-Memex / autologin-middleware
View on GitHub
Scrapy middleware for the autologin
☆36Apr 8, 2026Updated 3 months ago
scrapinghub / aile
View on GitHub
Automatic Item List Extraction
☆85Jun 15, 2016Updated 10 years ago
cocrawler / cocrawler
View on GitHub
CoCrawler is a versatile web crawler built using modern tools and concurrency.
☆194Apr 29, 2022Updated 4 years ago
svetlyak40wt / scrapy-useragents
View on GitHub
A middleware to use random user agent in Scrapy crawler.
☆33Dec 15, 2012Updated 13 years ago
TeamHG-Memex / imageSimilarity
View on GitHub
Given a new image, determine if it is likely derived from a known image.
☆21Apr 8, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
elliterate / xpath.py
View on GitHub
Python library for generating XPath expressions
☆19Oct 23, 2023Updated 2 years ago
ArturGaspar / scrapy-qtwebkit
View on GitHub
☆13Dec 4, 2019Updated 6 years ago
seagatesoft / webdext
View on GitHub
Intelligent Web Data Extractor
☆74Dec 5, 2022Updated 3 years ago
inferlink / landmark-extractor
View on GitHub
☆11May 31, 2019Updated 7 years ago
usc-isi-i2 / etk
View on GitHub
Extraction Toolkit
☆83Nov 18, 2021Updated 4 years ago
alertot / detectem
View on GitHub
detectem - detect software and its version on websites.
☆157Mar 25, 2021Updated 5 years ago
mediacloud / date_guesser
View on GitHub
A library to extract a publication date from a web page, along with a measure of the accuracy.
☆41Aug 13, 2019Updated 6 years ago
scrapinghub / autopager
View on GitHub
Detect and classify pagination links
☆15Sep 9, 2020Updated 5 years ago
RohanGautam / rust-aws-lambda
View on GitHub
Make a rust executable that runs on AWS lambda
☆10Mar 2, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TeamHG-Memex / aquarium
View on GitHub
Splash + HAProxy + Docker Compose
☆195Apr 8, 2026Updated 3 months ago
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
webrecorder / wsgiprox
View on GitHub
Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application
☆24Oct 27, 2020Updated 5 years ago
kserhii / money-parser
View on GitHub
Price and currency parsing utility
☆27Mar 6, 2023Updated 3 years ago
scrapinghub / extruct
View on GitHub
Extract embedded metadata from HTML markup
☆966Apr 1, 2026Updated 3 months ago
matthewruttley / mozclassify
View on GitHub
Algorithms for URL Classification
☆19Apr 13, 2015Updated 11 years ago
scrapy-plugins / scrapy-pagestorage
View on GitHub
A scrapy extension to store requests and responses information in storage service
☆27Mar 11, 2022Updated 4 years ago