N0taN3rd / node-warcLinks
Parse And Create Web ARChive (WARC) files with node.js
☆103Updated 10 months ago
Alternatives and similar repositories for node-warc
Users that are interested in node-warc are comparing it to the libraries listed below
Sorting:
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆172Updated 5 years ago
- JS Streaming WARC IO optimized for Browser and Node☆52Updated 3 months ago
- wabac.js - Web Archive Browsing Augmentation Client☆116Updated 2 weeks ago
- Chrome extension to "Create WARC files from any webpage"☆226Updated 2 weeks ago
- Specifications developed and maintained by the Webrecorder community.☆137Updated 2 months ago
- Automatically extracts structured information from webpages☆109Updated 3 years ago
- visualise readability☆213Updated last year
- Quickly estimate the similarity between many sets☆53Updated 3 years ago
- generate rules from lists of words☆16Updated 4 years ago
- A list of tools related to W(eb)ARC(hive)☆65Updated 11 years ago
- Convert between DOM Range instances and text quotes.☆35Updated 2 years ago
- Storex Core - A modular and portable database abstraction ecosystem for JavaScript☆155Updated 2 weeks ago
- Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉☆63Updated 2 years ago
- Wombat.js client-side rewriting library☆107Updated 2 weeks ago
- Multilingual tokenizer that automatically tags each token with its type☆63Updated 2 years ago
- English NLP for Node.js and the browser.☆87Updated 2 years ago
- Extract structured data from the web using GraphQL.☆207Updated 7 years ago
- Minimal implementations of a couple of classic text analysis tools (TF-IDF and cosine similarity)☆57Updated 3 weeks ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 4 years ago
- Extended Date Time Format (ISO 8601-2 / EDTF) Parser for JavaScript☆75Updated 2 months ago
- Fast Metaphone implementation☆53Updated 3 years ago
- Parse WARC (Web Archive Files) as a node.js stream☆23Updated 11 years ago
- Snapshots a web page to get it as a static, self-contained HTML document.☆299Updated 3 years ago
- neato compression for key-value data☆110Updated last year
- Convert between DOM Range instances and text positions.☆26Updated 5 years ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆38Updated 3 weeks ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 4 years ago
- Image perceptual hash calculation in javascript☆175Updated 5 years ago
- Compress json-data based on its json-schema while still having valid json☆99Updated this week