ukwa / webarchive-explorerLinks

Tools for exploring the contents of web archive files.

☆40

Alternatives and similar repositories for webarchive-explorer

Users that are interested in webarchive-explorer are comparing it to the libraries listed below

Sorting:

iipc / webarchive-commons
Common web archive utility code.
☆56Updated 2 weeks ago
alard / warc-proxy
Serving content from a WARC
☆62Updated 12 years ago
iipc / jwarc
Java library for reading and writing WARC files with a typed API
☆50Updated 2 months ago
iipc / warc-specifications
Centralised repository for WARC usage specifications.
☆118Updated last month
ikreymer / webarchive-indexing
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
☆46Updated 7 years ago
helgeho / ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…
☆153Updated last month
internetarchive / surt
Sort-friendly URI Reordering Transform (SURT) python module
☆44Updated 2 months ago
internetarchive / umbra
A queue-controlled browser automation tool for improving web crawl quality
☆63Updated 3 months ago
ukwa / webarchive-discovery
Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…
☆130Updated 3 months ago
archivesunleashed / aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
☆147Updated last year
internetarchive / warc
Python library for reading and writing warc files
☆244Updated 3 years ago
lockss / lockss-daemon
Classic LOCKSS System (LOCKSS 1.x)
☆67Updated this week
helgeho / Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
☆25Updated 8 years ago
lintool / warcbase
Warcbase is an open-source platform for managing analyzing web archives
☆161Updated 7 years ago
internetarchive / trough
Trough: Big data, small databases.
☆40Updated last year
nla / outbackcdx
Web archive index server based on RocksDB
☆36Updated 2 weeks ago
internetarchive / warctools
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
☆167Updated 2 months ago
netarchivesuite / solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
☆131Updated last week
openpreserve / fido
Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is des…
☆158Updated 8 months ago
WASAPI-Community / data-transfer-apis
WASAPI data transfer APIs
☆47Updated 3 years ago
sweble / sweble-wikitext
The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…
☆73Updated last year
frictionlessdata / datapackage-java
A Java library for working with Frictionless Data Data Packages.
☆23Updated 2 weeks ago
zorba-processor / zorba
Zorba - the NoSQL processor
☆42Updated last year
mitre / rhapsode
Advanced desktop search/corpus exploration prototype
☆21Updated 4 years ago
wikimedia / wikidata-query-blazegraph
Github mirror of "wikidata/query/blazegraph" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer…
☆15Updated 5 years ago
vinaygoel / archive-analysis
Tools to analyze web archives
☆20Updated 9 years ago
FUB-HCC / neonion
neonion is a user-centered collaborative semantic annotation webapp developed at the Human-Centered Computing group at Freie Universität …
☆68Updated 6 years ago
ikreymer / browsertrix
(Note: This repository is obsolete, please see the new Browsertrix webrecorder/browsertrix) Browser-Based On-Demand Web Archiving Automat…
☆39Updated 6 years ago
oflimm / openbib
OpenBib discovery infrastructure
☆10Updated this week
alard / megawarc
Nondestructive warc-in-tar to warc conversion
☆27Updated 12 years ago