WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆48Mar 19, 2018Updated 7 years ago
Alternatives and similar repositories for WarcMiddleware
Users that are interested in WarcMiddleware are comparing it to the libraries listed below
Sorting:
- Serving content from a WARC☆62Jan 5, 2013Updated 13 years ago
- Convert Directories, Files and ZIP Files to Web Archives (WARC)☆93Apr 22, 2025Updated 10 months ago
- Update a local archive of your tweets.☆49Oct 12, 2012Updated 13 years ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆56Oct 21, 2018Updated 7 years ago
- C++ library to parse WARC files☆11Jan 27, 2019Updated 7 years ago
- Your Personal Finance Simple & Private https://entaxy.io☆10Feb 4, 2026Updated 3 weeks ago
- Tool and library for handling Web ARChive (WARC) files.☆165Oct 11, 2024Updated last year
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Dec 4, 2017Updated 8 years ago
- Pages saved with SingleFile☆12Mar 16, 2024Updated last year
- CDXJ Indexing of WARC/ARCs☆33Dec 10, 2024Updated last year
- Converts HTTrack crawls to WARC files☆34Aug 6, 2024Updated last year
- your on-line tool for task management fun☆59Jul 5, 2010Updated 15 years ago
- Fetch git-annex metadata from IMDB☆10Feb 10, 2018Updated 8 years ago
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- Comparing warc files☆17Feb 21, 2019Updated 7 years ago
- ☆16Dec 13, 2014Updated 11 years ago
- ☆13Dec 4, 2019Updated 6 years ago
- Wget-compatible web downloader and crawler.☆600Apr 29, 2024Updated last year
- NTLM auth plugin for HTTPie☆19Dec 8, 2016Updated 9 years ago
- Helps building a minimal Apache for DokuWiki-on-a-Stick☆16Dec 13, 2023Updated 2 years ago
- Scrapyd web application for managing projects, spiders and visualize jobs logs and items☆14Jan 19, 2016Updated 10 years ago
- Read and write WARC files in Go☆49Feb 13, 2026Updated 2 weeks ago
- Serialize HTML tables into JSON in Ruby☆17Aug 23, 2013Updated 12 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Jan 14, 2026Updated last month
- A collection of tools for archiving and analysing the internet.☆78Jul 6, 2022Updated 3 years ago
- Web archiving using Google Chrome☆46Dec 30, 2019Updated 6 years ago
- Python WSGI Middleware for adding HTTP/S proxy support to any WSGI Application☆24Oct 27, 2020Updated 5 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Oct 9, 2017Updated 8 years ago
- An archival and backup file system for Linux using FUSE.☆25Jan 22, 2017Updated 9 years ago
- Script that simplifies exporting all your stuff out of Thingiverse☆30Mar 30, 2021Updated 4 years ago
- Support for writing WARC files with Scrapy☆24Dec 21, 2019Updated 6 years ago
- a framework and language for exploring and analyzing feeds of social media data.☆23Jan 25, 2012Updated 14 years ago
- Read and write WARC files in Go☆50Apr 9, 2018Updated 7 years ago
- Warcbase is an open-source platform for managing analyzing web archives☆162Dec 8, 2017Updated 8 years ago
- Git is distributed version control system focused on speed, effectivity and real-world usability on large projects. Its highlights includ…☆42Feb 18, 2010Updated 16 years ago
- Tools to Work with the Web Archive Ecosystem in R☆21Aug 20, 2017Updated 8 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Feb 10, 2026Updated 3 weeks ago
- Fast filtering and animation of large dynamic networks☆39May 24, 2016Updated 9 years ago
- Deduplicating filesystem via Python3, FUSE and SQLite☆28Feb 17, 2026Updated 2 weeks ago