ukwa / ukwa-heritrix
The UKWA Heritrix3 custom modules and Docker builder.
☆11Updated 4 months ago
Alternatives and similar repositories for ukwa-heritrix:
Users that are interested in ukwa-heritrix are comparing it to the libraries listed below
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆110Updated 2 months ago
- Rails application for the Archives Unleashed Cloud.☆11Updated 3 years ago
- WARC and ARC indexing and discovery tools.☆122Updated 3 weeks ago
- Java library for reading and writing WARC files with a typed API☆48Updated 3 months ago
- Single server/laptop grade file-observatory☆10Updated 2 years ago
- Collaborative collection development for web archives☆18Updated 5 years ago
- Archive Research Services Workshop☆31Updated 7 years ago
- WASAPI data transfer APIs☆44Updated 2 years ago
- Centralised repository for WARC usage specifications.☆109Updated 4 months ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated 2 weeks ago
- ☆23Updated last year
- Web application for distributed compute analysis of Archive-It web archive collections.☆16Updated 2 weeks ago
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- Web archive index server based on RocksDB☆34Updated 4 months ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆148Updated 2 months ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆158Updated 4 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆41Updated 8 months ago
- A persistent repository for PRONOM Research Week activities☆12Updated 3 years ago
- Converts WARC files to static HTML☆44Updated 9 months ago
- Prototype wikidata portal project.☆10Updated 11 months ago
- ☆14Updated 8 years ago
- Siegfried-based characterization tool for directories and disk images☆84Updated 3 months ago
- ☆11Updated last year
- Description des formats de fichier☆11Updated 3 years ago
- A collection of tools for archiving and analysing the internet.☆72Updated 2 years ago
- Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store …☆25Updated 10 months ago
- Automating description for Web Archives in ArchivesSpace using the Archive-It CDX and Partner Data APIs☆11Updated 6 years ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆22Updated last month
- Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archive…☆26Updated 2 years ago
- An un-official user guide for the KryoFlux written by archivists, for archivists☆94Updated last year