ukwa / ukwa-heritrixLinks
The UKWA Heritrix3 custom modules and Docker builder.
☆11Updated 11 months ago
Alternatives and similar repositories for ukwa-heritrix
Users that are interested in ukwa-heritrix are comparing it to the libraries listed below
Sorting:
- Java library for reading and writing WARC files with a typed API☆50Updated last month
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆153Updated last month
- Streaming WARC/ARC library for fast web archive IO☆438Updated 11 months ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆131Updated last week
- An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.☆200Updated 5 months ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆130Updated 3 months ago
- Single server/laptop grade file-observatory☆10Updated 2 years ago
- Common web archive utility code.☆56Updated 2 weeks ago
- File validation and characterisation.☆184Updated last week
- ☆26Updated 2 years ago
- Carefully curated list of awesome digital preservation resources.☆104Updated 3 months ago
- NARA digital preservation file format risk analysis and preservation plans☆232Updated 2 months ago
- ☆10Updated 6 months ago
- DROID (Digital Record and Object Identification)☆345Updated this week
- ☆35Updated this week
- Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is des…☆158Updated 8 months ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆187Updated last week
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 6 years ago
- A collection of tools for archiving and analysing the internet.☆78Updated 3 years ago
- Loader software for automated imaging of optical media with Nimbie disc robot☆36Updated 8 months ago
- A tool for detecting viruses and NSFW material in WARC files☆17Updated last year
- Centralised repository for WARC usage specifications.☆118Updated last month
- The study group Bits and Bots accommodates digital preservation professionals seeking coding abilities. In this repository, you can find …☆41Updated 3 months ago
- Siegfried-based characterization tool for directories and disk images☆86Updated 11 months ago
- Python package for harvesting records from OAI-PMH provider(s).☆64Updated 3 years ago
- Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store …☆29Updated last year
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆89Updated 6 months ago
- File Information Tool Set☆96Updated last week
- Sort-friendly URI Reordering Transform (SURT) python module☆44Updated 2 months ago
- A NoSketch Engine Docker image which is easy to use☆20Updated last month