nla / httrack2warcLinks
Converts HTTrack crawls to WARC files
β33Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below
Sorting:
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β58Updated last year
- Recover lost websites from the Web Infrastructureβ91Updated 5 months ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.β12Updated last year
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β130Updated 4 months ago
- A collection of tools for archiving and analysing the internet.β78Updated 3 years ago
- β11Updated 4 years ago
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β15Updated 5 years ago
- Archiving public telegram messages.β16Updated 4 months ago
- Bash scripts which interact with Internet Archive Wayback Machine's Save Page Nowβ136Updated 9 months ago
- Home of the official apt/deb package for Ubuntu/Debian-based systems.β16Updated last year
- A list of things related to software, literature, and other content for π£ Mementoβ103Updated last year
- URLTeam's second generation of URL shortener archiving toolsβ79Updated 4 months ago
- Scripts to build and boot warrior virtual machine containing Dockerβ122Updated 9 months ago
- A server to collect & archive websites that also supports video downloadsβ84Updated 2 years ago
- Strip advertisements from downloaded YouTube videosβ60Updated 4 years ago
- [mirror] Backup a list of github starred repositories for the specified user.β143Updated 2 years ago
- A youtube-dl extension with pluggable extractorsβ53Updated 8 months ago
- Archiving URLs (outlinks) from a variety of sources.β25Updated 3 weeks ago
- Grabbing everything from reddit.β61Updated last year
- Tool and library for handling Web ARChive (WARC) files.β164Updated last year
- The (new) discovery backend for https://odcrawler.xyzβ36Updated 2 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)β168Updated 4 months ago
- π Reverse search an image on every search engineβ42Updated 5 years ago
- wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improvedβ30Updated 3 months ago
- Wayback Machine Downloader. π₯ Download your entire archived websites from the Internet Archive Wayback Machine.β100Updated 3 years ago
- Scrape https://unlistedvideos.com/β15Updated 4 years ago
- Mozilla LZ4 File Decryption and Mining Toolsβ38Updated 8 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Archβ¦β19Updated last year
- A collection of scripts that make spending time on the web easy.β70Updated 2 years ago
- Sources for urls-grab.β12Updated this week