arquivo / pwa-technologiesLinks
Arquivo.pt main goal is the preservation and access of web contents that are no longer available online. During the developing of the PWA IR (information retrieval) system we faced limitations in searching speed, quality of results, scalability and usability. To cope with this, we modified the archive-access project (http://archive-access.sourc…
☆51Updated 2 months ago
Alternatives and similar repositories for pwa-technologies
Users that are interested in pwa-technologies are comparing it to the libraries listed below
Sorting:
- Converts WARC files to static HTML☆49Updated last month
- Centralised repository for WARC usage specifications.☆117Updated last week
- A tool for collection archival slivers of the web and web archives☆15Updated 8 months ago
- Command line tool for digging into WARC files☆46Updated 3 weeks ago
- A social media open post web archiving tool☆27Updated 2 weeks ago
- ☆27Updated 3 years ago
- Webrecorder Automated In-Page Behavior Framework☆13Updated 4 years ago
- A Rails engine supporting the discovery of web archives.☆50Updated 2 years ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆125Updated last month
- Scraper for German democracy documents☆38Updated 2 years ago
- CDXJ Indexing of WARC/ARCs☆29Updated 10 months ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- A command line utility for listing and searching snapshots in web archives☆17Updated last year
- ☆53Updated last year
- The repo for the PetScan tool☆56Updated this week
- Sort-friendly URI Reordering Transform (SURT) python module☆44Updated last month
- Trough: Big data, small databases.☆41Updated last year
- Web archive index server based on RocksDB☆35Updated last month
- Specifications developed and maintained by the Webrecorder community.☆136Updated this week
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆50Updated last week
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- freeyourstuff.cc - universal content liberation☆81Updated 2 years ago
- A Memento Aggregator CLI and Server in Go☆68Updated 7 months ago
- search interface for scholarly works☆86Updated last year
- A Github Action for turning Markdown into ReSpec HTML☆14Updated last year
- A listing of world wide web archives, for humans and machines using Web Archive Manifest (WAM) yaml format☆53Updated 2 years ago
- Tool to import files from the Internet Archive to Wikimedia Commons.☆18Updated 2 months ago
- Comparing warc files☆17Updated 6 years ago
- Searchable Linkable Open Public Indexed (SLOPI) Communication☆21Updated 2 years ago
- Classic LOCKSS System (LOCKSS 1.x)☆67Updated this week