arquivo / pwa-technologies
Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourc…
☆43Updated 3 weeks ago
Alternatives and similar repositories for pwa-technologies:
Users that are interested in pwa-technologies are comparing it to the libraries listed below
- Command line tool for digging into WARC files☆38Updated this week
- CDXJ Indexing of WARC/ARCs☆25Updated 2 months ago
- Web archive index server based on RocksDB☆34Updated 2 months ago
- Support for writing WARC files with Scrapy☆21Updated 5 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆14Updated 3 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆41Updated 6 months ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆51Updated 2 years ago
- A Rails engine supporting the discovery of web archives.☆50Updated last year
- A command line utility for listing and searching snapshots in web archives☆15Updated last year
- A PDF classifier ensemble with REST API service☆23Updated 3 years ago
- Digital Preservation of HTTP in documentary heritage.☆22Updated last year
- WASAPI data transfer APIs☆43Updated 2 years ago
- A Github Action for turning Markdown into ReSpec HTML☆14Updated 8 months ago
- Converts WARC files to static HTML☆43Updated 7 months ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 3 years ago
- WARC and ARC indexing and discovery tools.☆121Updated 6 months ago
- ☆27Updated 2 years ago
- JavaScript module and CLI tool for working with web archive data using the WACZ format specification.☆13Updated last week
- Specification for authentication and creating signed WACZ Files☆10Updated 3 years ago
- Centralised repository for WARC usage specifications.☆105Updated 2 months ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆30Updated last month
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 6 months ago
- Tools for helping you work with web platform archive downloads.☆17Updated 4 years ago
- A set of utilities for processing MediaWiki XML dump data.☆50Updated this week
- A tool for detecting viruses and NSFW material in WARC files☆11Updated 6 months ago
- Nondestructive warc-in-tar to warc conversion☆26Updated 11 years ago
- ☆11Updated last year
- Static Site Generator for Viewing Web Archives (in WACZ) format☆22Updated last year