Converts HTTrack crawls to WARC files
☆34Aug 6, 2024Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Oct 5, 2024Updated last year
- A prototype server to swarm multiple DATs for Webrecorder☆14Apr 27, 2019Updated 6 years ago
- CDXJ Indexing of WARC/ARCs☆33Dec 10, 2024Updated last year
- ☆30Jun 6, 2024Updated last year
- Nondestructive warc-in-tar to warc conversion☆27Apr 21, 2013Updated 12 years ago
- Merges HOSTS files☆12Dec 19, 2025Updated 3 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆57Aug 15, 2024Updated last year
- A command line utility for listing and searching snapshots in web archives☆17Dec 21, 2023Updated 2 years ago
- Encrypted end to end file transfer☆108Jul 13, 2018Updated 7 years ago
- Fetch git-annex metadata from IMDB☆10Feb 10, 2018Updated 8 years ago
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…☆11Feb 17, 2022Updated 4 years ago
- utility to create an element from a simple CSS selector☆13Aug 1, 2023Updated 2 years ago
- ☆10Dec 11, 2021Updated 4 years ago
- Command line tool for digging into WARC files☆51Feb 27, 2026Updated 3 weeks ago
- Diff two unist trees☆14Aug 21, 2020Updated 5 years ago
- 🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...☆38Aug 12, 2018Updated 7 years ago
- A server to collect & archive websites that also supports video downloads☆84Feb 11, 2023Updated 3 years ago
- Clone of https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git/ with patches for yubikey support☆10Aug 14, 2020Updated 5 years ago
- JS Streaming WARC IO optimized for Browser and Node☆53Feb 26, 2026Updated 3 weeks ago
- ☆16Dec 13, 2014Updated 11 years ago
- 💻 Install Openwhyd on your computer, play music in the background☆17Nov 10, 2022Updated 3 years ago
- Specifications developed and maintained by the Webrecorder community.☆140Oct 16, 2025Updated 5 months ago
- Verifiable Credential Extensions☆12Feb 12, 2025Updated last year
- One-Click User Instigated Preservation☆128Feb 3, 2019Updated 7 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆19Aug 28, 2023Updated 2 years ago
- A client library for interacting with the Gogs REST api.☆13Apr 30, 2019Updated 6 years ago
- Conifer setup and deployment via Ansible☆12Jun 15, 2020Updated 5 years ago
- Miscellaneous tools for processing WARC files from the CommonCrawl☆25Jan 1, 2014Updated 12 years ago
- ☆60Apr 11, 2024Updated last year
- Material for my React Fundamentals Workshop☆17Dec 27, 2022Updated 3 years ago
- Standard implementation of TRC404☆10Jan 20, 2025Updated last year
- An archival and backup file system for Linux using FUSE.☆25Jan 22, 2017Updated 9 years ago
- Les réflexions menées au cours du 404CTF 2023 pour résoudre les challenges proposés☆10Dec 16, 2023Updated 2 years ago
- CaddyServer module for processing images on the fly.☆14Nov 24, 2025Updated 3 months ago
- A default backend (404 page) for nginx-ingress in Kubernetes☆13Jan 23, 2018Updated 8 years ago
- PlayStation GPU (WIP)☆18Oct 3, 2023Updated 2 years ago
- ☆16Sep 9, 2021Updated 4 years ago
- UEFI signing tools for Linux -- Forked to support AWS CloudHSM☆13Aug 25, 2021Updated 4 years ago