Compare html similarity using structural and style metrics
☆218May 11, 2023Updated 2 years ago
Alternatives and similar repositories for html-similarity
Users that are interested in html-similarity are comparing it to the libraries listed below
Sorting:
- Simple heuristic for measuring web page similarity (& data set)☆90Feb 23, 2026Updated 2 weeks ago
- A toolkit for clustering web pages based on various similarity measures.☆34Oct 27, 2021Updated 4 years ago
- Generates the most important key-phrase/key-words from a document based on a corpus☆10Jun 17, 2024Updated last year
- Show summary of a large number of URLs in a Jupyter Notebook☆17Feb 10, 2026Updated last month
- 网页相似度判断:根据网页结构判断页面相似性 ,可用于相似度计算、越权检测等(Determine page similarity based on HTML page structure)☆282Jul 27, 2019Updated 6 years ago
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- Intelligent Web Data Extractor☆74Dec 5, 2022Updated 3 years ago
- Lazy reading of file objects for efficient batch processing☆10Sep 6, 2017Updated 8 years ago
- Infer Types by Python Tracing☆11Aug 29, 2022Updated 3 years ago
- BiLSTM+CRF☆10Jan 15, 2019Updated 7 years ago
- Example using R on Amazon ec2☆11Jun 22, 2020Updated 5 years ago
- On-the-fly Table Generation - SIGIR'18☆10Feb 1, 2020Updated 6 years ago
- Web-based IDE for Python, Scheme, and SQL intended for students taking CS 61A.☆11Dec 10, 2022Updated 3 years ago
- Implementation of Cascaded Head-colliding Attention (ACL'2021)☆11Sep 16, 2021Updated 4 years ago
- ☆13Jun 14, 2016Updated 9 years ago
- ☆12Apr 29, 2022Updated 3 years ago
- ☆12Jan 22, 2020Updated 6 years ago
- Utility for asserting the structure and content of HTML in python.☆24May 4, 2020Updated 5 years ago
- Automation script to download JSON MISP files from a SFTP server and import them via API to a MISP instance.☆15May 12, 2023Updated 2 years ago
- 一些学习笔记 Good Good Study Day Day Up !☆12Mar 1, 2023Updated 3 years ago
- Simple, fast dictionary-based language detector for short texts.☆20Feb 5, 2026Updated last month
- The CRATOS proxy API integrates with your MISP instance and allows to extract indicators that can be consumed by security components such…☆13Sep 21, 2025Updated 5 months ago
- Official repository of "Efficient and Effective Query Expansion for Web Search", Short Paper @ CIKM 2018☆15Nov 17, 2019Updated 6 years ago
- extract difference between two html pages☆33Feb 10, 2026Updated last month
- Proxy-On-Demand: A serverless HTTP(S) proxy on AWS lambda☆13Nov 18, 2023Updated 2 years ago
- A Flask LIME explainer app for fine-grained sentiment classification.☆12May 1, 2023Updated 2 years ago
- A classifier for detecting soft 404 pages☆58Feb 10, 2026Updated last month
- Python module for Named Entity Recognition (NER) using natural language processing.☆13May 30, 2021Updated 4 years ago
- Find elements in HTML by matching them with a skeleton☆25Jul 6, 2022Updated 3 years ago
- A minimalist flight search engine written in Python☆12Aug 4, 2017Updated 8 years ago
- A fast TLS Cert scanner to scan HTTPS and SMTP servers☆14Sep 18, 2019Updated 6 years ago
- Kaiba is No-Code Configurable JSON data transformation☆14Apr 8, 2024Updated last year
- Source code of SniperOJ running on server right now☆12Oct 23, 2018Updated 7 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- ☆16Apr 24, 2024Updated last year
- Variable-order CRFs with structure learning☆17Aug 1, 2024Updated last year
- Investigating multilingual language models (BERT) by using them for NER in German and English☆14Apr 30, 2019Updated 6 years ago
- Slack Bot using Python and FastAPI 🐍☆15Nov 7, 2022Updated 3 years ago
- (Unofficial) Python API for http://netcraft.com☆15Jul 6, 2016Updated 9 years ago