ukwa / ukwa-heritrixLinks
The UKWA Heritrix3 custom modules and Docker builder.
☆11Updated 6 months ago
Alternatives and similar repositories for ukwa-heritrix
Users that are interested in ukwa-heritrix are comparing it to the libraries listed below
Sorting:
- Java library for reading and writing WARC files with a typed API☆48Updated 6 months ago
- WARC and ARC indexing and discovery tools.☆124Updated 3 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆115Updated 3 weeks ago
- Single server/laptop grade file-observatory☆10Updated 2 years ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆150Updated last week
- Carefully curated list of awesome digital preservation resources.☆92Updated 3 weeks ago
- Streaming WARC/ARC library for fast web archive IO☆416Updated 6 months ago
- The study group Bits and Bots accommodates digital preservation professionals seeking coding abilities. In this repository, you can find …☆41Updated this week
- An un-official user guide for the KryoFlux written by archivists, for archivists☆97Updated last year
- ☆10Updated 2 months ago
- Prototype wikidata portal project.☆10Updated last year
- A persistent repository for PRONOM Research Week activities☆12Updated 4 years ago
- A collection of tools for archiving and analysing the internet.☆77Updated 2 years ago
- Loader software for automated imaging of optical media with Nimbie disc robot☆35Updated 3 months ago
- "checkit_tiff" is an incredibly fast conformance checker for baseline TIFFs (with various extensions)☆14Updated 3 years ago
- NARA digital preservation file format risk analysis and preservation plans☆228Updated 3 weeks ago
- Nanite - a friendly swarm of format-identifying robots.☆16Updated last year
- Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store …☆25Updated last year
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated 2 months ago
- ☆25Updated 2 years ago
- Siegfried-based characterization tool for directories and disk images☆83Updated 6 months ago
- Archive Research Services Workshop☆31Updated 7 years ago
- File validation and characterisation.☆180Updated last month
- Tool and library for handling Web ARChive (WARC) files.☆160Updated 8 months ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆23Updated 2 weeks ago
- Code for processing and archiving emails☆14Updated 10 months ago
- Automating description for Web Archives in ArchivesSpace using the Archive-It CDX and Partner Data APIs☆11Updated 6 years ago
- ☆34Updated 4 months ago
- Python library for reading and writing warc files☆241Updated 3 years ago
- NOTE: This project is no longer being actively developed.. Check out https://replayweb.page / https://github.com/webrecorder/replayweb.pa…☆201Updated 5 months ago