N0taN3rd / node-warc
Parse And Create Web ARChive (WARC) files with node.js
☆94Updated last year
Related projects ⓘ
Alternatives and complementary repositories for node-warc
- wabac.js - Web Archive Browsing Augmentation Client☆100Updated this week
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆169Updated 4 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆88Updated 3 years ago
- JS Streaming WARC IO optimized for Browser and Node☆35Updated this week
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆81Updated last week
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆23Updated this week
- Automatically extracts structured information from webpages☆108Updated 2 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆152Updated 4 years ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆102Updated last week
- Wombat.js client-side rewriting library☆84Updated this week
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆92Updated 6 years ago
- Webrecorder Automated In-Page Behavior Framework☆12Updated 3 years ago
- ⚙️ [Processor] A better English POS tagger written in JavaScript☆53Updated 7 years ago
- Snapshots a web page to get it as a static, self-contained HTML document.☆271Updated 2 years ago
- Specifications developed and maintained by the Webrecorder community.☆124Updated this week
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆145Updated 2 months ago
- Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more …☆201Updated this week
- Quickly estimate the similarity between many sets☆50Updated last year
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 3 years ago
- Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)☆439Updated 4 years ago
- A Memento Aggregator CLI and Server in Go☆57Updated 6 months ago
- English Part-of-speech (POS) tagger☆65Updated last year
- A list of tools related to W(eb)ARC(hive)☆54Updated 10 years ago
- Formula to detect ease of reading according to the Automated Readability Index (1967)☆52Updated 2 years ago
- Throw JavaScript objects at the index and they will become retrievable by their properties using promises and map-reduce☆19Updated 2 months ago
- Parse WARC (Web Archive Files) as a node.js stream☆22Updated 10 years ago
- Vanilla JavaScript implementation of the Weighted PageRank Algorithm☆33Updated 5 years ago
- WARC and ARC indexing and discovery tools.☆117Updated 3 months ago
- ☆56Updated last year
- ECMAScript libraries for handling RDF data (based off of the current RDF APIs and webr3's js3)☆94Updated 2 years ago