ukwa / webarchive-explorerLinks
Tools for exploring the contents of web archive files.
☆40Updated 4 years ago
Alternatives and similar repositories for webarchive-explorer
Users that are interested in webarchive-explorer are comparing it to the libraries listed below
Sorting:
- Centralised repository for WARC usage specifications.☆115Updated 9 months ago
- Common web archive utility code.☆56Updated last month
- Serving content from a WARC☆62Updated 12 years ago
- Java library for reading and writing WARC files with a typed API☆50Updated last month
- Python library for reading and writing warc files☆244Updated 3 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆73Updated last year
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆152Updated 3 weeks ago
- A queue-controlled browser automation tool for improving web crawl quality☆61Updated 2 weeks ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆128Updated last month
- Warcbase is an open-source platform for managing analyzing web archives☆162Updated 7 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- A crawler for the Linked Data web☆36Updated 7 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆162Updated last week
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆145Updated last year
- Sort-friendly URI Reordering Transform (SURT) python module☆42Updated last year
- An HTTP-based warc-to-zip converter☆12Updated 12 years ago
- Web archive index server based on RocksDB☆34Updated last month
- Blazegraph Tinkerpop3 Implementation☆62Updated 4 years ago
- CSV Validation Tool and API (CSV Schema RI)☆216Updated last week
- RDFpro☆12Updated 3 years ago
- LDIF - Linked Data Integration Framework☆37Updated 9 years ago
- Tool and library for handling Web ARChive (WARC) files.☆163Updated 10 months ago
- RDF store on a cloud-based architecture (previously on https://code.google.com/p/cumulusrdf)☆31Updated 9 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆73Updated 8 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 8 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- CiteSeerX public repository☆133Updated last year
- Github mirror of "wikidata/query/rdf" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆151Updated 2 weeks ago
- Mirror of Apache Stanbol (incubating)☆114Updated last year
- The OpenWayback Development☆503Updated last year