openpreserve / pagelyzerLinks
Suite of tools for detecting changes in web pages and their rendering
☆54Updated last year
Alternatives and similar repositories for pagelyzer
Users that are interested in pagelyzer are comparing it to the libraries listed below
Sorting:
- Tools for web page segmentation. In development☆17Updated 6 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- CommonCrawl WARC/WET/WAT examples and processing code for Java + Hadoop☆56Updated 4 years ago
- D3 and Play based visualization for entity-relation graphs, especially for NLP and information extraction☆30Updated 9 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Common Crawl Index Server☆68Updated 3 months ago
- Site Hound (previously THH) is a Domain Discovery Tool