matiskay/html-cluster

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/matiskay/html-cluster)

matiskay / html-cluster

A command line tool to cluster html pages based on structural and style similarity.

☆20

Alternatives and similar repositories for html-cluster

Users that are interested in html-cluster are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

matiskay / html-similarity
View on GitHub
Compare html similarity using structural and style metrics
☆219Updated this week
USCDataScience / autoextractor
View on GitHub
A toolkit for clustering web pages based on various similarity measures.
☆34Oct 27, 2021Updated 4 years ago
WittleWolfie / PyGram
View on GitHub
An efficient approximation for tree edit-distance.
☆45Sep 6, 2011Updated 14 years ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
cbail / web-scraping-with-r-extended-edition
View on GitHub
Repository for one-day course "Web Scraping with R, extended edition"
☆10Mar 21, 2017Updated 9 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
basilica-ai / basilica-r-client
View on GitHub
Basilica client for R
☆11Dec 8, 2022Updated 3 years ago
niclasmattsson / Supergrid
View on GitHub
A capacity expansion model of the electricity system for arbitrary world regions, written in Julia 1.x.
☆12Jun 30, 2026Updated 3 weeks ago
rekcahemal / Trinity
View on GitHub
This is a web site scraper. Collects all urls from any site.
☆16Apr 28, 2015Updated 11 years ago
computational-culture-lab / comp-acculturation
View on GitHub
Reference implementation for measuring linguistic cultural distances between individuals and groups.
☆15Aug 7, 2019Updated 6 years ago
Future-Energy-Associates / granular_certificate_registry
View on GitHub
An open-source platform to demonstrate the capabilities of a Granular Certificate registry that conforms to the EnergyTag Standards and A…
☆13Jul 20, 2026Updated last week
Irieo / 247-procurement-paper
View on GitHub
Code for the paper on 247-CFE procurement
☆11Dec 13, 2024Updated last year
recap-utr / arguebuf-python
View on GitHub
Create and analyze argument graphs and serialize them via Protobuf
☆10Jul 21, 2026Updated last week
jsphpl / redirect-mapper
View on GitHub
Generate a redirect map from two sitemaps for website migration.
☆13May 4, 2018Updated 8 years ago
namiyousef / argument-mining
View on GitHub
Repository for NLP project. Name to be changed when we decide on a project
☆16Apr 19, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
anacrolix / sqlrpc
View on GitHub
SQL over RPC, specifically for SQLite
☆10Jul 17, 2018Updated 8 years ago
aclai-lab / ModalDecisionTrees.jl
View on GitHub
Julia implementation of Modal Decision Trees & Forests, for interpretable classification of spatial and temporal data. Long live Symbolic…
☆12Jun 16, 2026Updated last month
statsmodels / statsmodels.github.io
View on GitHub
documentation for statsmodels - currently temporary structure and location
☆13Updated this week
cuhksz-nlp / SAPar
View on GitHub
☆12Dec 23, 2022Updated 3 years ago
ncrocfer / csr2f
View on GitHub
CSR2F is a Python tool used for generating CSRF (Cross-Site Request Forgery) exploits
☆13Aug 22, 2019Updated 6 years ago
aws-samples / amazon-textract-a2i-pdf
View on GitHub
☆17Jul 15, 2022Updated 4 years ago
gansidui / bktree
View on GitHub
bk-tree for golang
☆11Jul 30, 2022Updated 3 years ago
kambrium / staticmapservice
View on GitHub
A web service that generates static maps
☆18Dec 30, 2025Updated 6 months ago
liujiashen9307 / Forcast
View on GitHub
Little time-series forecasting app for fun! More models/methods will be included after the june 15! Link: jasonliushiny.shinyapps.io/Forc…
☆14Nov 8, 2016Updated 9 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
roedoejet / convertextract
View on GitHub
Extract and find/replace text based on arbitrary correspondences while preserving original file formatting. This library is a fork from t…
☆11Sep 8, 2023Updated 2 years ago
google-research-datasets / Textual-Entailment-New-Protocols
View on GitHub
This data release is meant to accompany and document the paper: https://arxiv.org/abs/2004.11997 Collecting Entailment Data for Pretrain…
☆14Sep 29, 2020Updated 5 years ago
src-d / go-vitess
View on GitHub
An automatic filter-branch of Go libraries from the great Vitess project.
☆14Jun 2, 2019Updated 7 years ago
google-marketing-solutions / prefetchalyzer
View on GitHub
Identify impactful pre-fetch and pre-cache opportunities across web pages in user flow by analyzing HAR logs
☆15Feb 18, 2025Updated last year
TheCedarPrince / LinAlgTuts.jl
View on GitHub
Linear Algebra tutorials written in pure Julia. This repository contains tutorials that go alongside the textbook Introduction to Linear …
☆16Jul 27, 2020Updated 6 years ago
AntoineAugusti / vacances-scolaires
View on GitHub
Vacances scolaires en France
☆16Mar 26, 2026Updated 4 months ago
chriskiehl / GooeyVideo
View on GitHub
A small collection of FFMPEG tools which I use while working on Gooey
☆15May 28, 2025Updated last year
do-community / python3_web_api_tutorial
View on GitHub
☆11Oct 24, 2020Updated 5 years ago
liuzl / pullword
View on GitHub
Unsupervised Word Discovery
☆10Jul 26, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Geospatial-Data-Science / geospatial-machine-learning
View on GitHub
A curated list of resources focused on Machine Learning in Geospatial Data Science.
☆12Jun 21, 2018Updated 8 years ago
sonumja / NLP_ArticleSpinner
View on GitHub
Simple NLP Article Spinner algorithm
☆12Aug 30, 2018Updated 7 years ago
USEPA / EPA_OMEGA_Model
View on GitHub
Model to evaluate policies for reducing greenhouse gas emissions from light duty vehicles
☆20Nov 7, 2025Updated 8 months ago
mrtkp9993 / AnomalyDetectionShiny
View on GitHub
Shiny app for anomaly detection using AnomalyDetection package.
☆11Jul 15, 2019Updated 7 years ago
yunan4nlp / E-NNRSTParser
View on GitHub
A neural RST discourse parser with well pre-trained XLNet.
☆17Jun 13, 2022Updated 4 years ago
REMitchell / data-day-seattle
View on GitHub
Sample Crawler for Data Day Seattle
☆10Jun 27, 2015Updated 11 years ago
ianramzy / ticker-iq
View on GitHub
📈 Stock screener and portfolio analyzer, providing key insights on financial reports, news articles and more!
☆13Jun 24, 2019Updated 7 years ago