TeamHG-Memex / page-compareView external linksLinks
Simple heuristic for measuring web page similarity (& data set)
☆90Jan 22, 2026Updated 3 weeks ago
Alternatives and similar repositories for page-compare
Users that are interested in page-compare are comparing it to the libraries listed below
Sorting:
- Compare html similarity using structural and style metrics☆218May 11, 2023Updated 2 years ago
- A command line tool to cluster html pages based on structural and style similarity.☆20Jan 13, 2026Updated last month
- extract difference between two html pages☆32Updated this week
- Show summary of a large number of URLs in a Jupyter Notebook☆17Updated this week
- Formasaurus tells you the type of an HTML form and its fields using machine learning☆119Updated this week
- A Scrapy extension to log items coverage when the spider shuts down☆19Apr 11, 2020Updated 5 years ago
- Given a new image, determine if it is likely derived from a known image.☆20Updated this week
- A generic crawler☆78Updated this week
- Pipeline for distributed Natural Language Processing, made in Python☆65Jan 31, 2017Updated 9 years ago
- Scraper built with Scrapy.☆18Aug 14, 2024Updated last year
- ☆12Apr 7, 2015Updated 10 years ago
- ☆21Jan 23, 2016Updated 10 years ago
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- A component that tries to avoid downloading duplicate content☆27Updated this week
- Manage and load dataprotocols.org Data Packages☆27Sep 17, 2015Updated 10 years ago
- Pattern-of-Behavior Search Tool☆11Jun 20, 2022Updated 3 years ago
- ☆13Jun 14, 2016Updated 9 years ago
- Vizlinc☆15Jan 14, 2016Updated 10 years ago
- Tools for scraping of twitter data, conversion, text analysis and graph construction☆11Aug 1, 2016Updated 9 years ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- Scrapy middleware which allows to crawl only new content☆79Updated this week
- General Architecture for Text Engineering☆49Mar 23, 2016Updated 9 years ago
- A distributed in-memory fabric based on shared-memory blocks and datashape. Any language can operate on the data.☆13Feb 12, 2016Updated 10 years ago
- Automate The Boring Stuff: Updating WordPress☆12Jun 1, 2021Updated 4 years ago
- Events and Situations Ontology☆14Apr 20, 2018Updated 7 years ago
- ☆16Nov 9, 2020Updated 5 years ago
- ☆22Feb 29, 2024Updated last year
- A rotating socks proxy using Tor, Delegate and Haproxy☆13Updated this week
- Source code of SniperOJ running on server right now☆12Oct 23, 2018Updated 7 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- ☆16Apr 24, 2024Updated last year
- Python binding for gumbo-parser using Cython☆14Aug 16, 2016Updated 9 years ago
- Group workspace for improvements to the Columbia Newsblaster system.☆31May 12, 2016Updated 9 years ago
- Extract text from HTML☆134Updated this week
- The User Activity Logging Engine, or User-ALE, is a logging mechanism used to quantitatively assess the behavioural and cognitive state o…☆13Aug 26, 2016Updated 9 years ago
- (Unofficial) Python API for http://netcraft.com☆15Jul 6, 2016Updated 9 years ago
- Basic linked data fragments endpoint.☆15Apr 20, 2017Updated 8 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Sep 11, 2015Updated 10 years ago
- Next generation graph processing platform☆12Aug 26, 2016Updated 9 years ago