Converts HTTrack crawls to WARC files
☆34Aug 6, 2024Updated last year
Alternatives and similar repositories for httrack2warc
Users that are interested in httrack2warc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Web archive index server based on RocksDB☆38Apr 1, 2026Updated last week
- ☆30Jun 6, 2024Updated last year
- Nondestructive warc-in-tar to warc conversion☆27Apr 21, 2013Updated 12 years ago
- url canonicalization library for python and java