nla / httrack2warcLinks
Converts HTTrack crawls to WARC files
β33Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below
Sorting:
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β58Updated last year
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.β12Updated last year
- β11Updated 4 years ago
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ75Updated last week
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β131Updated 2 weeks ago
- Recover lost websites from the Web Infrastructureβ91Updated 5 months ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.β28Updated last year
- Tool and library for handling Web ARChive (WARC) files.β164Updated last year
- A youtube-dl extension with pluggable extractorsβ53Updated 9 months ago
- download books from archive.orgβ32Updated last year
- Bash scripts which interact with Internet Archive Wayback Machine's Save Page Nowβ138Updated 9 months ago
- Strip advertisements from downloaded YouTube videosβ61Updated 4 years ago
- URLTeam's second generation of URL shortener archiving toolsβ81Updated 4 months ago
- wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improvedβ30Updated 4 months ago
- A collection of tools for archiving and analysing the internet.β77Updated 3 years ago
- Home of the official apt/deb package for Ubuntu/Debian-based systems.β16Updated last year
- foobar2000 plugin to submit listen history to your Maloja serverβ13Updated 2 years ago
- A configurable, reusable tracker with dashboardβ36Updated 2 years ago
- Userscript to strip click tracking junk from Google search results URLsβ15Updated 6 years ago
- [mirror] Backup a list of github starred repositories for the specified user.β143Updated 2 years ago
- Archiving public telegram messages.β16Updated last week
- A list of things related to software, literature, and other content for π£ Mementoβ104Updated 2 weeks ago
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β15Updated 5 years ago
- A server to collect & archive websites that also supports video downloadsβ84Updated 2 years ago
- Scrape https://unlistedvideos.com/β15Updated 4 years ago
- Mozilla LZ4 File Decryption and Mining Toolsβ38Updated 8 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Archβ¦β19Updated last year
- Scripts to build and boot warrior virtual machine containing Dockerβ122Updated 9 months ago
- Archiving URLs (outlinks) from a variety of sources.β25Updated last week
- The ArchiveWeb.page Siteβ32Updated 2 months ago