ukwa / webarchive-explorer
Tools for exploring the contents of web archive files.
☆39Updated 3 years ago
Related projects: ⓘ
- Common web archive utility code.☆50Updated last week
- Centralised repository for WARC usage specifications.☆98Updated last month
- A queue-controlled browser automation tool for improving web crawl quality☆60Updated 4 years ago
- Tools to analyze web archives☆20Updated 8 years ago
- Fcrepo4 webapp plus optional fcrepo dependencies☆13Updated 3 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆41Updated 6 years ago
- Trough: Big data, small databases.☆38Updated last month
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 6 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆39Updated last month
- WARC and ARC indexing and discovery tools.☆114Updated last month
- Using social media to steer web archiving and curation.☆15Updated 8 years ago
- (Note: This repository is obsolete, please see the new Browsertrix webrecorder/browsertrix) Browser-Based On-Demand Web Archiving Automat…☆39Updated 5 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆39Updated 8 years ago
- Serving content from a WARC☆60Updated 11 years ago
- Web archive index server based on RocksDB☆31Updated last week
- Python script to create CDX index files of WARC data☆20Updated 2 years ago
- A Memento Aggregator CLI and Server in Go☆55Updated 3 months ago
- Python script to create CDX index files of WARC data☆14Updated 6 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆24Updated last month
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆44Updated 5 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆87Updated last year
- Java library for reading and writing WARC files with a typed API☆46Updated 2 months ago
- CDXJ Indexing of WARC/ARCs☆21Updated 3 months ago
- Prototype SOLR-powered web archive exploration UI.☆42Updated 4 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆13Updated 3 years ago
- Internet Archive Data Mining Tools☆44Updated 3 years ago
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- ☆14Updated this week
- A set of utilities for accessing and processing MediaWiki data.☆55Updated 5 years ago
- Free-form web data notebook - "Data management for little guys"☆25Updated last year