matiskay / html-similarityView external linksLinks
Compare html similarity using structural and style metrics
☆218May 11, 2023Updated 2 years ago
Alternatives and similar repositories for html-similarity
Users that are interested in html-similarity are comparing it to the libraries listed below
Sorting:
- Simple heuristic for measuring web page similarity (& data set)☆90Jan 22, 2026Updated 3 weeks ago
- A toolkit for clustering web pages based on various similarity measures.☆34Oct 27, 2021Updated 4 years ago
- Generates the most important key-phrase/key-words from a document based on a corpus☆10Jun 17, 2024Updated last year
- Automatic Item List Extraction☆86Jun 15, 2016Updated 9 years ago
- Intelligent Web Data Extractor☆74Dec 5, 2022Updated 3 years ago
- On-the-fly Table Generation - SIGIR'18☆10Feb 1, 2020Updated 6 years ago
- Infer Types by Python Tracing☆11Aug 29, 2022Updated 3 years ago
- Web-based IDE for Python, Scheme, and SQL intended for students taking CS 61A.☆11Dec 10, 2022Updated 3 years ago
- Lazy reading of file objects for efficient batch processing☆10Sep 6, 2017Updated 8 years ago
- BiLSTM+CRF☆10Jan 15, 2019Updated 7 years ago
- Implementation of Cascaded Head-colliding Attention (ACL'2021)☆11Sep 16, 2021Updated 4 years ago
- ☆13Jun 14, 2016Updated 9 years ago
- ☆12Apr 29, 2022Updated 3 years ago
- ☆12Jan 22, 2020Updated 6 years ago
- extract difference between two html pages☆32Updated this week
- Find elements in HTML by matching them with a skeleton☆25Jul 6, 2022Updated 3 years ago
- Automation script to download JSON MISP files from a SFTP server and import them via API to a MISP instance.☆15May 12, 2023Updated 2 years ago
- Official repository of "Efficient and Effective Query Expansion for Web Search", Short Paper @ CIKM 2018☆15Nov 17, 2019Updated 6 years ago
- ☆12Jan 21, 2019Updated 7 years ago
- PHP low-level client for Vespa. https://vespa.ai/☆17Jan 22, 2026Updated 3 weeks ago
- Proxy-On-Demand: A serverless HTTP(S) proxy on AWS lambda☆13Nov 18, 2023Updated 2 years ago
- A Flask LIME explainer app for fine-grained sentiment classification.☆12May 1, 2023Updated 2 years ago
- A fast TLS Cert scanner to scan HTTPS and SMTP servers☆14Sep 18, 2019Updated 6 years ago
- Source code for the paper "Web2Text: Deep Structured Boilerplate Removal", full paper @ ECIR'18☆170Oct 28, 2021Updated 4 years ago
- Kaiba is No-Code Configurable JSON data transformation☆14Apr 8, 2024Updated last year
- Source code of SniperOJ running on server right now☆12Oct 23, 2018Updated 7 years ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Jan 16, 2024Updated 2 years ago
- Investigating multilingual language models (BERT) by using them for NER in German and English☆14Apr 30, 2019Updated 6 years ago
- Scrape a website and deploy to Amazon S3 to generate a serverless website.☆13May 30, 2018Updated 7 years ago
- ☆18Jun 12, 2023Updated 2 years ago
- (Unofficial) Python API for http://netcraft.com☆15Jul 6, 2016Updated 9 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆35Mar 6, 2017Updated 8 years ago
- Golang implementation of PyMISP-feedgenerator☆18Jul 31, 2022Updated 3 years ago
- TextFlows is an open-source online platform for composition, execution, and sharing of interactive text mining and natural language proce…☆19Dec 1, 2017Updated 8 years ago
- Code for "Boosted Generative Models", AAAI 2018.☆20Dec 26, 2017Updated 8 years ago
- Near real-time Twitter network visualisation☆16Feb 4, 2022Updated 4 years ago
- Elemental makes Selenium automation faster and easier.☆36Nov 16, 2023Updated 2 years ago
- A Scrapy extension to log items coverage when the spider shuts down☆19Apr 11, 2020Updated 5 years ago
- 8-bit raspberry pi game☆14Jan 19, 2017Updated 9 years ago