nla / httrack2warcLinks
Converts HTTrack crawls to WARC files
☆33Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below
Sorting:
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆14Updated 4 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated last year
- Archiving public telegram messages.☆15Updated last month
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆58Updated last year
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆15Updated 5 years ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆130Updated last month
- ☆11Updated 3 years ago
- A server to collect & archive websites that also supports video downloads☆86Updated 2 years ago
- Home of the official apt/deb package for Ubuntu/Debian-based systems.☆17Updated last year
- A youtube-dl extension with pluggable extractors☆52Updated 6 months ago
- Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now☆132Updated 6 months ago
- The (new) discovery backend for https://odcrawler.xyz☆34Updated 2 years ago
- Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.☆16Updated 2 months ago
- URLTeam's second generation of URL shortener archiving tools☆79Updated last month
- Strip advertisements from downloaded YouTube videos☆60Updated 4 years ago
- A command line tool to archive a git repository from GitHub to the Internet Archive.☆91Updated 4 years ago
- Server and bookmarklet to download files via youtube-dl directly from your browser. Cross platform single binary installation, web browse…☆78Updated 4 months ago
- Grabbing everything from reddit.☆61Updated last year
- END OF THE WORLD☆11Updated 5 years ago
- Mozilla LZ4 File Decryption and Mining Tools☆37Updated 5 months ago
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆18Updated last year
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- [mirror] Backup a list of github starred repositories for the specified user.☆141Updated 2 years ago
- Scrape https://unlistedvideos.com/☆15Updated 4 years ago
- download books from archive.org☆31Updated 11 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated last year
- a gui for TRID ( http://mark0.net/soft-trid-e.html )☆20Updated 9 years ago
- wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved☆30Updated 3 weeks ago
- 🔍 Reverse search an image on every search engine☆43Updated 5 years ago
- Wayback Machine Downloader. 🔥 Download your entire archived websites from the Internet Archive Wayback Machine.☆100Updated 3 years ago