ukwa / ukwa-heritrix
The UKWA Heritrix3 custom modules and Docker builder.
☆11Updated 4 months ago
Alternatives and similar repositories for ukwa-heritrix:
Users that are interested in ukwa-heritrix are comparing it to the libraries listed below
- WARC and ARC indexing and discovery tools.☆123Updated last month
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆112Updated this week
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆150Updated 2 weeks ago
- Java library for reading and writing WARC files with a typed API☆48Updated 4 months ago
- Archive Research Services Workshop☆31Updated 7 years ago
- Experimental continouous web crawler for web archiving☆9Updated 2 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆160Updated 4 years ago
- ☆10Updated 2 weeks ago
- Streaming WARC/ARC library for fast web archive IO☆410Updated 4 months ago
- ☆34Updated 2 months ago
- Single server/laptop grade file-observatory☆10Updated 2 years ago
- WASAPI data transfer APIs☆44Updated 3 years ago
- Converts WARC files to static HTML☆44Updated 10 months ago
- An awesome list for Mirador's projects and plugins.☆42Updated last year
- A collection of tools for archiving and analysing the internet.☆74Updated 2 years ago
- ☆25Updated last year
- A tool for detecting viruses and NSFW material in WARC files☆14Updated 8 months ago
- DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.☆22Updated 2 weeks ago
- Centralised repository for WARC usage specifications.☆110Updated 5 months ago
- Open-source tools for working with BIBFRAME (see: http://bibframe.org), by default BIBFRAME Lite (see: http://bibfra.me) and more general…☆24Updated 3 years ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆143Updated last year
- Tool and library for handling Web ARChive (WARC) files.☆157Updated 6 months ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆85Updated this week
- Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store …☆25Updated 11 months ago
- Rails application for the Archives Unleashed Cloud.☆11Updated 3 years ago
- Python library for reading and writing warc files☆240Updated 3 years ago
- Named Entity Recognition☆19Updated 2 weeks ago
- A persistent repository for PRONOM Research Week activities☆12Updated 3 years ago
- Note: the repo has been moved to https://gitlab.com/readcoop/Transkribus/TranskribusCore☆37Updated 4 years ago
- Automating description for Web Archives in ArchivesSpace using the Archive-It CDX and Partner Data APIs☆11Updated 6 years ago