nla / httrack2warc
Converts HTTrack crawls to WARC files
☆32Updated 8 months ago
Alternatives and similar repositories for httrack2warc:
Users that are interested in httrack2warc are comparing it to the libraries listed below
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 6 months ago
- ArchiveBoxMatic: configure ArchiveBox with the simplicity of a yaml file.☆14Updated 4 years ago
- Archiving public telegram messages.☆12Updated 3 months ago
- ☆11Updated 3 years ago
- Home of the official apt/deb package for Ubuntu/Debian-based systems.☆17Updated 6 months ago
- A list of things related to software, literature, and other content for 🕣 Memento☆97Updated 10 months ago
- Scrape https://unlistedvideos.com/☆14Updated 3 years ago
- simple script to convert web resources to a single warc file☆21Updated last year
- Awesome list dedicated to digital and data preservation tools, sources, services and so on.☆25Updated 2 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆53Updated 7 months ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆28Updated 6 months ago
- 🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...☆36Updated 6 years ago
- A server to collect & archive websites that also supports video downloads☆86Updated 2 years ago
- Merging WARCs into a single WARC file☆15Updated 10 years ago
- Script to extract entire font families from Fonts.com, rips them as woff2 and final output includes woff2 and ttf files☆24Updated 3 years ago
- Adblock/AdGuard filters for various self-empowerment☆20Updated 4 months ago
- Archiving URLs (outlinks) from a variety of sources.☆21Updated last month
- Find open directories using Open Directory Search Tool☆33Updated 5 years ago
- Browser userscript to clean up hyperlink redirections and link shims☆19Updated 3 years ago
- Server and bookmarklet to download files via youtube-dl directly from your browser. Cross platform single binary installation, web browse…☆70Updated last year
- The (new) discovery backend for https://odcrawler.xyz☆29Updated 2 years ago
- A configurable, reusable tracker with dashboard☆34Updated last year
- The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.☆7Updated last week
- Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now☆120Updated this week
- A youtube-dl extension with pluggable extractors☆49Updated last month
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆15Updated 4 years ago
- A utility to track, save and view stats on your soulseek uploads☆34Updated 9 months ago
- download books from archive.org☆24Updated 5 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated this week
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 3 years ago