arquivo / pwa-technologies
Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourc…
☆43Updated 2 months ago
Alternatives and similar repositories for pwa-technologies:
Users that are interested in pwa-technologies are comparing it to the libraries listed below
- Command line tool for digging into WARC files☆39Updated this week
- CDXJ Indexing of WARC/ARCs☆25Updated 3 months ago
- A tool for collection archival slivers of the web and web archives☆13Updated last month
- A command line utility for listing and searching snapshots in web archives☆16Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 8 months ago
- A social media open post web archiving tool☆25Updated 3 weeks ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 3 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆14Updated 3 years ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆52Updated 2 years ago
- search interface for scholarly works☆84Updated 8 months ago
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- ☆10Updated 3 years ago
- Web archive index server based on RocksDB☆34Updated 4 months ago
- Specification for authentication and creating signed WACZ Files☆10Updated 3 years ago
- A Memento TimeGate☆42Updated 4 years ago
- A Memento Aggregator CLI and Server in Go☆62Updated 3 weeks ago
- Converts WARC files to static HTML☆44Updated 9 months ago
- Web application for distributed compute analysis of Archive-It web archive collections.☆16Updated 2 weeks ago
- A Github Action for turning Markdown into ReSpec HTML☆14Updated 9 months ago
- Object Resource Stream and CDXJ Drafts☆14Updated 6 years ago
- Link Wikidata items to large catalogs☆96Updated last month
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆29Updated 2 weeks ago
- A tool for detecting viruses and NSFW material in WARC files☆11Updated 7 months ago
- Scraper for German democracy documents☆37Updated last year
- Create and edit WARC and WACZ files☆9Updated 3 months ago
- JavaScript module and CLI tool for working with web archive data using the WACZ format specification.☆13Updated 3 weeks ago
- Conifer setup and deployment via Ansible☆12Updated 4 years ago
- Tool to import files from the Internet Archive to Wikimedia Commons.☆16Updated last month
- ☆27Updated 2 years ago
- A Rails engine supporting the discovery of web archives.☆50Updated last year