ukwa / ukwa-heritrix
The UKWA Heritrix3 custom modules and Docker builder.
☆11Updated 5 months ago
Alternatives and similar repositories for ukwa-heritrix
Users that are interested in ukwa-heritrix are comparing it to the libraries listed below
Sorting:
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆113Updated this week
- Java library for reading and writing WARC files with a typed API☆48Updated 4 months ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆150Updated last week
- WARC and ARC indexing and discovery tools.☆123Updated 2 months ago
- Siegfried-based characterization tool for directories and disk images☆84Updated 5 months ago
- The study group Bits and Bots accommodates digital preservation professionals seeking coding abilities. In this repository, you can find …☆40Updated 3 weeks ago
- ☆34Updated 2 months ago
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store …☆25Updated last year
- File Information Tool Set☆94Updated 2 months ago
- A persistent repository for PRONOM Research Week activities☆12Updated 3 years ago
- WASAPI data transfer APIs☆44Updated 3 years ago
- Single server/laptop grade file-observatory☆10Updated 2 years ago
- The objective of this script is to allow archivists to find groups of records that may be inactive because of their age.☆10Updated 8 years ago
- Rails application for the Archives Unleashed Cloud.☆11Updated 3 years ago
- QA Catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA, UNIMARC)☆83Updated this week
- Pre-Ingest Tool for creating submission information packages☆22Updated 8 months ago
- Carefully curated list of awesome digital preservation resources.☆90Updated 3 months ago
- Identify, review, and remove sensitive files☆29Updated 2 years ago
- A tool for creating and managing Mailbags, a package for preserving email using multiple preservation formats☆47Updated 9 months ago
- ☆25Updated 2 years ago
- Tool and library for handling Web ARChive (WARC) files.☆158Updated 7 months ago
- Shepherding our web archives from crawl to access.☆10Updated last year
- Prototype wikidata portal project.☆10Updated last year
- Streaming WARC/ARC library for fast web archive IO☆413Updated 5 months ago
- Computer-Aided Metadata Generation for Photoarchives Initiative☆16Updated 4 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated 3 weeks ago
- The Bagger application packages data files according to the BagIt specification.☆127Updated 2 years ago
- Tools used for harmful language description audit in Duke's Rubenstein Library, including binaries, documentation, and source code for pu…☆19Updated 3 years ago
- DEPRECATED. Replaced with Electron desktop application: https://github.com/bulk-reviewer/bulk-reviewer☆13Updated 6 years ago