arquivo / pwa-technologies
Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourc…
☆41Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pwa-technologies
- Command line tool for digging into WARC files☆34Updated 3 weeks ago
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆43Updated last year
- CDXJ Indexing of WARC/ARCs☆21Updated last week
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆13Updated 3 years ago
- Converts WARC files to static HTML☆39Updated 4 months ago
- ☆11Updated last year
- Static Site Generator for Viewing Web Archives (in WACZ) format☆21Updated last year
- Specifications of the reconciliation API☆33Updated last week
- A Rails engine supporting the discovery of web archives.☆49Updated last year
- Centralised repository for WARC usage specifications.☆100Updated this week
- Support for writing WARC files with Scrapy☆20Updated 4 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- A PDF classifier ensemble with REST API service☆23Updated 3 years ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆23Updated 9 months ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 3 months ago
- an image annotation and publication tool☆27Updated 4 years ago
- A social media open post web archiving tool☆25Updated last month
- Web application for distributed compute analysis of Archive-It web archive collections.☆15Updated 2 months ago
- Specification for authentication and creating signed WACZ Files☆9Updated 2 years ago
- Tools for helping you work with web platform archive downloads.☆17Updated 4 years ago
- Webrecorder Automated In-Page Behavior Framework☆12Updated 3 years ago
- JavaScript module and CLI tool for working with web archive data using the WACZ format specification.☆13Updated 2 months ago
- The Digital Humanities Literacy Guidebook☆61Updated 2 years ago
- Digital Preservation of HTTP in documentary heritage.☆22Updated last year
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆23Updated last week
- Comparing warc files☆15Updated 5 years ago
- Trough: Big data, small databases.☆40Updated 3 months ago
- A Memento TimeGate☆40Updated 4 years ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆102Updated last week
- ☆25Updated last month