Read and write WARC files in Go
β49Feb 13, 2026Updated 2 weeks ago
Alternatives and similar repositories for gowarc
Users that are interested in gowarc are comparing it to the libraries listed below
Sorting:
- Web archive index server based on RocksDBβ38Updated this week
- State-of-the-art web crawler π±β384Updated this week
- Command line tool for digging into WARC filesβ51Updated this week
- CDXJ Indexing of WARC/ARCsβ33Dec 10, 2024Updated last year
- π§© Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser enβ¦β19Jul 11, 2025Updated 7 months ago
- Read and write WARC files in Goβ48Apr 9, 2018Updated 7 years ago
- A dockerized, queued high fidelity web archiver based on Squidwarcβ62Jul 9, 2024Updated last year
- ArchiveWeb.page Express!β14Nov 1, 2024Updated last year
- Convert HTTP Archive (HAR) -> Web Archive (WARC) formatβ56Oct 21, 2018Updated 7 years ago
- Docker for ScanTailor and ScanTailor Advancedβ14Mar 17, 2024Updated last year
- Add your configs for tmuxβ18Apr 3, 2022Updated 3 years ago
- A ServiceWorker for client-side reconstruction of composite mementosβ16Mar 6, 2025Updated 11 months ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archivesβ16Jun 10, 2021Updated 4 years ago
- Pages saved with SingleFileβ12Mar 16, 2024Updated last year
- β16Oct 2, 2025Updated 5 months ago
- β16Dec 13, 2014Updated 11 years ago
- β17Mar 31, 2025Updated 11 months ago
- Comparing warc filesβ17Feb 21, 2019Updated 7 years ago
- A fast URL parser for Goβ40Mar 4, 2023Updated 2 years ago
- lod-explorativ is a prototype of a Svelte webapp which let you explore bibliographic resources from a topic's point of view.β15Jan 19, 2022Updated 4 years ago
- OAI-PMH harvester in shell.β17Dec 23, 2025Updated 2 months ago
- A client for the Archive-It And Webrecorder WASAPI Data Transfer APIβ16Oct 18, 2019Updated 6 years ago
- A command line utility for listing and searching snapshots in web archivesβ17Dec 21, 2023Updated 2 years ago
- β16Nov 20, 2017Updated 8 years ago
- Archiving public telegram messages.β17Updated this week
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.β48Mar 19, 2018Updated 7 years ago
- Centralised repository for WARC usage specifications.β125Oct 12, 2025Updated 4 months ago
- Efficient hOCR toolingβ55Aug 18, 2025Updated 6 months ago
- β22Mar 22, 2023Updated 2 years ago
- DuckDB Engine as Google Sheets Libraryβ20Dec 14, 2024Updated last year
- Process lines in parallel.β21Jan 23, 2025Updated last year
- Call git-annex commands from Pythonβ46Nov 16, 2022Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β55Feb 10, 2026Updated 3 weeks ago
- Support for writing WARC files with Scrapyβ24Dec 21, 2019Updated 6 years ago
- A tool for archiving DokuWikiβ28Jan 30, 2026Updated last month
- Converts WARC files to static HTMLβ51Sep 18, 2025Updated 5 months ago
- WHATWG conformant url parser for the Go languageβ26Apr 16, 2025Updated 10 months ago
- A tool to help upload sets of posters from ThePosterDB and MediUX to your Plex server in seconds!β35Feb 14, 2026Updated 2 weeks ago
- β27Oct 14, 2022Updated 3 years ago