nla / httrack2warc
Converts HTTrack crawls to WARC files
☆30Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for httrack2warc
- Home of the official apt/deb package for Ubuntu/Debian-based systems.☆17Updated last month
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated last month
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆14Updated 4 years ago
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆13Updated 3 years ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆26Updated last month
- Archiving public telegram messages.☆12Updated this week
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆50Updated 3 months ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆13Updated 3 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆15Updated 9 months ago
- A configurable, reusable tracker with dashboard☆34Updated 11 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆45Updated last week
- Encode/decode binary data over a live streaming video in real time.☆13Updated last year
- ☆11Updated 2 years ago
- Warrior virtual machine appliance (version 4)☆21Updated 2 months ago
- A simple, configurable youtube-dl wrapper to download and manage youtube audio☆133Updated 4 years ago
- This Chrome extension save all web pages you viewed to the Wayback Machine☆13Updated 3 years ago
- Archiving URLs (outlinks) from a variety of sources.☆17Updated this week
- Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.☆14Updated this week
- a gui for TRID ( http://mark0.net/soft-trid-e.html )☆14Updated 8 years ago
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆20Updated 2 years ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆101Updated this week
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆15Updated last month
- A curated list of websites which do not load any obtrusive 3rd party content☆11Updated 2 years ago
- archiving community contributions on YouTube: unpublished captions, title and description translations and caption credits☆8Updated 4 years ago
- Rexit - Liberate your Reddit Chats. This tool will export your reddit chats into a plethora of formats☆20Updated 11 months ago
- Simple IPFS-based file sharing, modified from pomf.se (RIP in pieces)☆10Updated 7 years ago
- Proxies third-party PDF files and HTML pages with the Hypothesis client embedded, so you can annotate them☆20Updated this week
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing…☆41Updated this week
- The ArchiveWeb.page Site☆27Updated this week
- 🎶 Easily export your tracks from Last.fm and import them to Libre.fm or Scrobble.fm☆27Updated 6 years ago