Converts HTTrack crawls to WARC files
☆34Aug 6, 2024Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below
Sorting:
- CDXJ Indexing of WARC/ARCs☆33Dec 10, 2024Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Nondestructive warc-in-tar to warc conversion☆27Apr 21, 2013Updated 12 years ago
- Pages saved with SingleFile☆12Mar 16, 2024Updated last year
- Merges HOSTS files☆12Dec 19, 2025Updated 2 months ago
- Fetch git-annex metadata from IMDB☆10Feb 10, 2018Updated 8 years ago
- ☆30Jun 6, 2024Updated last year
- ☆19Jun 19, 2019Updated 6 years ago
- ☆16Dec 13, 2014Updated 11 years ago
- A client library for interacting with the Gogs REST api.☆13Apr 30, 2019Updated 6 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Oct 5, 2024Updated last year
- A command line utility for listing and searching snapshots in web archives☆17Dec 21, 2023Updated 2 years ago
- Command line tool for digging into WARC files☆50Feb 13, 2026Updated 2 weeks ago
- Simple tool that removes link masking/tracking and optionally resolves shortened links.☆28Oct 11, 2022Updated 3 years ago
- ☆15Sep 30, 2020Updated 5 years ago
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…☆11Feb 17, 2022Updated 4 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆58Aug 15, 2024Updated last year
- Index Filesystem for FUSE☆17Dec 15, 2021Updated 4 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Oct 9, 2017Updated 8 years ago
- Usefull for IoT / maker projects for reducing SD, Nand and Emmc block wear via log operations. Uses Zram to minimise precious memory foot…☆20Mar 11, 2020Updated 5 years ago
- An archival and backup file system for Linux using FUSE.☆25Jan 22, 2017Updated 9 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆19Aug 28, 2023Updated 2 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆92Apr 22, 2025Updated 10 months ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆25Jan 1, 2014Updated 12 years ago
- ☆20Feb 17, 2019Updated 7 years ago
- Support for writing WARC files with Scrapy☆24Dec 21, 2019Updated 6 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆62Jul 9, 2024Updated last year
- Deduplicating filesystem via Python3, FUSE and SQLite☆28Feb 17, 2026Updated last week
- ☆26Dec 15, 2020Updated 5 years ago
- The MediaWiki source pages for "Sakaki's EFI Install Guide" (as hosted on the Gentoo wiki)☆27Jan 9, 2023Updated 3 years ago
- Parse And Create Web ARChive (WARC) files with node.js☆104Jan 29, 2025Updated last year
- Enable Samsung DEX on inner screen for the Galaxy Fold series.☆10Feb 2, 2026Updated 3 weeks ago
- Parse WARC (Web Archive Files) as a node.js stream☆23Oct 20, 2014Updated 11 years ago
- Saves proxied HTTP traffic to a WARC file.☆28Oct 22, 2013Updated 12 years ago
- One-Click User Instigated Preservation☆128Feb 3, 2019Updated 7 years ago
- The ArchiveWeb.page Site☆32Nov 7, 2025Updated 3 months ago
- Golang WARC (Web ARChive) Library☆30Aug 6, 2019Updated 6 years ago
- 404Games Wastelands V2 - Chernarus☆25Jun 25, 2013Updated 12 years ago
- ☆13Feb 28, 2023Updated 3 years ago