Compare html similarity using structural and style metrics
☆218May 11, 2023Updated 2 years ago
Alternatives and similar repositories for html-similarity
Users that are interested in html-similarity are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple heuristic for measuring web page similarity (& data set)☆91Feb 23, 2026Updated last month
- Show summary of a large number of URLs in a Jupyter Notebook☆19Feb 10, 2026Updated last month
- BiLSTM+CRF☆10Jan 15, 2019Updated 7 years ago
- ☆16Apr 24, 2024Updated last year
- ☆13Jun 14, 2016Updated 9 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Open source code for MobiPurpose project☆13Mar 25, 2025Updated last year
- Intelligent Web Data Extractor☆74Dec 5, 2022Updated 3 years ago
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- ☆19Oct 12, 2016Updated 9 years ago
- Generates the most important key-phrase/key-words from a document based on a corpus☆10Jun 17, 2024Updated last year
- Lazy reading of file objects for efficient batch processing☆10Sep 6, 2017Updated 8 years ago
- A classifier for detecting soft 404 pages☆60Feb 10, 2026Updated last month
- extract difference between two html pages☆33Feb 10, 2026Updated last month
- A fast TLS Cert scanner to scan HTTPS and SMTP servers☆14Sep 18, 2019Updated 6 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- The undocumented API for reporting sites to Safe Browsing☆11Jun 3, 2020Updated 5 years ago
- Python client for Zyte API☆29Feb 10, 2026Updated last month
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆170Oct 28, 2021Updated 4 years ago
- Code for "Contextualized Embeddings in Named-Entity Recognition", ECIR 2020☆13Jul 25, 2024Updated last year
- A list of libraries and NLP projects for Portuguese☆19May 22, 2017Updated 8 years ago
- Python port of Boilerpipe library☆16Apr 6, 2018Updated 7 years ago
- This repository distributes a Windows application using which the user can change the cache folder path of popular web browsers.☆10Sep 29, 2025Updated 6 months ago
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago
- On-the-fly Table Generation - SIGIR'18☆10Feb 1, 2020Updated 6 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Official repository of "Efficient and Effective Query Expansion for Web Search", Short Paper @ CIKM 2018☆15Nov 17, 2019Updated 6 years ago
- Scrape a website and deploy to Amazon S3 to generate a serverless website.☆13May 30, 2018Updated 7 years ago
- ☆10Jul 20, 2020Updated 5 years ago
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- Search for answers on StackOverflow from Alfred☆13Feb 3, 2024Updated 2 years ago
- Python 3 implementation and documentation of the Hermina-Janos local graph clustering algorithm.☆24Jan 22, 2023Updated 3 years ago
- A multi-language segmenter using high-order CRF.☆17Feb 27, 2020Updated 6 years ago
- Headless chrome/chromium automation library (unofficial port of puppeteer)☆3,560Aug 5, 2021Updated 4 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A utility to force query DNS over DoH off of CloudFlare API when DNS block is in place☆10Aug 26, 2018Updated 7 years ago
- Utility for asserting the structure and content of HTML in python.☆24May 4, 2020Updated 5 years ago
- Find elements in HTML by matching them with a skeleton☆25Jul 6, 2022Updated 3 years ago
- ☆12Apr 29, 2022Updated 3 years ago
- Inference with state-of-the-art models (pre-trained by LD-Net / AutoNER / VanillaNER / ...)☆118Dec 15, 2018Updated 7 years ago
- This repository is a curated list of pro bono incident response entities.☆21Jun 21, 2023Updated 2 years ago
- Performance-focused replacement for Python urllib☆21Oct 2, 2018Updated 7 years ago