Converts HTTrack crawls to WARC files
☆34Aug 6, 2024Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Oct 5, 2024Updated last year
- CDXJ Indexing of WARC/ARCs☆34May 11, 2026Updated last month
- Merges HOSTS files☆12Dec 19, 2025Updated 6 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆57Aug 15, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Pages saved with SingleFile☆13Mar 16, 2024Updated 2 years ago
- Index Filesystem for FUSE☆17Dec 15, 2021Updated 4 years ago
- A command line utility for listing and searching snapshots in web archives☆18Jun 4, 2026Updated 3 weeks ago
- Encrypted end to end file transfer☆108Jul 13, 2018Updated 7 years ago
- Scripts for FFmpeg☆18Sep 9, 2023Updated 2 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆98Apr 22, 2025Updated last year
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████ ╗██████╔╝ ╚═╝ ╚═╝╚═══…☆11Feb 17, 2022Updated 4 years ago
- A Rust library for reading and writing WARC files☆60Nov 27, 2024Updated last year
- ☆10Dec 11, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Command line tool for digging into WARC files☆50Jun 22, 2026Updated last week
- Diff two unist trees☆14Aug 21, 2020Updated 5 years ago
- 🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...☆38Aug 12, 2018Updated 7 years ago
- A server to collect & archive websites that also supports video downloads☆85Feb 11, 2023Updated 3 years ago
- Clone of https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git/ with patches for yubikey support☆10Aug 14, 2020Updated 5 years ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- ☆16Dec 13, 2014Updated 11 years ago
- JS Streaming WARC IO optimized for Browser and Node☆55Mar 25, 2026Updated 3 months ago
- ☆12Oct 13, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- 404Games Wastelands V2 - Chernarus☆26Jun 25, 2013Updated 13 years ago
- Specifications developed and maintained by the Webrecorder community.☆141Oct 16, 2025Updated 8 months ago
- Basis of FragDenStaat.de's „Koalitionstracker“☆15Jul 14, 2025Updated 11 months ago
- Small string compression using smaz compression algorithm. Fast, because it's in C. Supports Python 3+☆13Oct 18, 2025Updated 8 months ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Aug 28, 2023Updated 2 years ago
- One-Click User Instigated Preservation☆128Feb 3, 2019Updated 7 years ago
- [WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages☆17Apr 14, 2026Updated 2 months ago
- The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…☆12Dec 15, 2023Updated 2 years ago
- ☆22Jun 11, 2026Updated 2 weeks ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!☆16Oct 30, 2024Updated last year
- A reddit bot that finds original publish dates on linked articles.☆10Nov 30, 2024Updated last year
- Miscellaneous tools for processing WARC files from the CommonCrawl☆25Jan 1, 2014Updated 12 years ago
- Single file C header for UTF-x-to-y conversions + helpers☆13Jun 11, 2023Updated 3 years ago
- ☆61Apr 11, 2024Updated 2 years ago
- Les réflexions menées au cours du 404CTF 2023 pour résoudre les challenges proposés☆10Dec 16, 2023Updated 2 years ago
- CaddyServer module for processing images on the fly.☆15Nov 24, 2025Updated 7 months ago