webrecorder / warcio
Streaming WARC/ARC library for fast web archive IO
☆410Updated 4 months ago
Alternatives and similar repositories for warcio:
Users that are interested in warcio are comparing it to the libraries listed below
- Python library for reading and writing warc files☆239Updated 3 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆160Updated 4 years ago
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆169Updated 3 months ago
- Index Common Crawl archives in tabular format☆117Updated last month
- A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/☆189Updated 6 years ago
- WARC and ARC indexing and discovery tools.