Converts HTTrack crawls to WARC files
☆34Aug 6, 2024Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Oct 5, 2024Updated last year
- Web archive index server based on RocksDB☆43Updated this week
- A prototype server to swarm multiple DATs for Webrecorder☆14Apr 27, 2019Updated 7 years ago
- CDXJ Indexing of WARC/ARCs☆34May 11, 2026Updated 3 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆30Jun 6, 2024Updated 2 years ago
- Merges HOSTS files☆12Dec 19, 2025Updated 5 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆57Aug 15, 2024Updated last year
- Pages saved with SingleFile☆13Mar 16, 2024Updated 2 years ago
- A command line utility for listing and searching snapshots in web archives☆17Updated this week
- Fetch git-annex metadata from IMDB☆11Feb 10, 2018Updated 8 years ago
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…☆11Feb 17, 2022Updated 4 years ago
- 🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...☆38Aug 12, 2018Updated 7 years ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- JS Streaming WARC IO optimized for Browser and Node☆54Mar 25, 2026Updated 2 months ago
- ☆16Dec 13, 2014Updated 11 years ago
- 404Games Wastelands V2 - Chernarus☆25Jun 25, 2013Updated 12 years ago
- Specifications developed and maintained by the Webrecorder community.☆140Oct 16, 2025Updated 7 months ago
- Support for writing WARC files with Scrapy☆24Dec 21, 2019Updated 6 years ago
- Verifiable Credential Extensions☆12Feb 12, 2025Updated last year
- One-Click User Instigated Preservation☆129Feb 3, 2019Updated 7 years ago
- A client library for interacting with the Gogs REST api.☆13Apr 30, 2019Updated 7 years ago
- A reddit bot that finds original publish dates on linked articles.☆10Nov 30, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Miscellaneous tools for processing WARC files from the CommonCrawl☆25Jan 1, 2014Updated 12 years ago
- Single file C header for UTF-x-to-y conversions + helpers☆13Jun 11, 2023Updated 2 years ago
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆1,047Updated this week
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- A simple 404 page that uses the pathname as input to generate a 404 message.☆13Apr 28, 2018Updated 8 years ago
- Les réflexions menées au cours du 404CTF 2023 pour résoudre les challenges proposés☆10Dec 16, 2023Updated 2 years ago
- An archival and backup file system for Linux using FUSE.☆25Jan 22, 2017Updated 9 years ago
- CaddyServer module for processing images on the fly.☆15Nov 24, 2025Updated 6 months ago
- PlayStation GPU (WIP)☆18Oct 3, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A C# library for loading LSD: Dream Emulator data files.☆15Aug 28, 2023Updated 2 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarc☆62Jul 9, 2024Updated last year
- ☆16Sep 9, 2021Updated 4 years ago
- Placeholder for tracking issues on deca website☆14Aug 13, 2020Updated 5 years ago
- A Simple C++ based CSSParser☆18May 13, 2026Updated 3 weeks ago
- Tools to Work with the Web Archive Ecosystem in R☆20Aug 20, 2017Updated 8 years ago
- Parse WARC (Web Archive Files) as a node.js stream☆23Oct 20, 2014Updated 11 years ago