Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
☆28Jul 31, 2024Updated last year
Alternatives and similar repositories for sandcrawler
Users that are interested in sandcrawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆12Oct 5, 2024Updated last year
- EpochFS is a versioned cloud file system with git-like branching, transaction support.☆17Mar 11, 2026Updated last week
- Trough: Big data, small databases.☆42Jul 25, 2024Updated last year
- ☆31Updated this week
- Demo app built using AngularJS with Backand serving as the back end☆13Mar 1, 2017Updated 9 years ago
- A prototype server to swarm multiple DATs for Webrecorder☆14Apr 27, 2019Updated 6 years ago
- ██████╗ ███████╗██████╗ ██╔══██╗██╔════╝██╔══██╗ ██████╔╝█████╗ ██║ ██║ ██╔══██╗██╔══╝ ██║ ██║ ██║ ██║███████╗██████╔╝ ╚═╝ ╚═╝╚═══…☆11Feb 17, 2022Updated 4 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆57Aug 15, 2024Updated last year
- produce a stream of citiation data coming off wikimedia☆12Mar 28, 2017Updated 8 years ago
- 🗄 Save an archived copy of websites from Pocket/Pinboard/Bookmarks/RSS. Outputs HTML, PDFs, and more...☆38Aug 12, 2018Updated 7 years ago
- Python script to create CDX index files of WARC data☆21Sep 4, 2025Updated 6 months ago
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆157Oct 8, 2025Updated 5 months ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- WASAPI data transfer APIs☆49Apr 23, 2022Updated 3 years ago
- A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons☆25Jul 3, 2016Updated 9 years ago
- consume data from Environment and Climate Change Canada☆13Jul 20, 2020Updated 5 years ago
- Run pkg.scripts subtasks in a runner-agnostic way (npm/yarn, whichever launched the main script)☆11Dec 25, 2023Updated 2 years ago
- Analytic platform for the HAL research archive (in development)☆13Oct 2, 2020Updated 5 years ago
- 🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)☆17Feb 27, 2026Updated 3 weeks ago
- The Wikinflection Corpus, from the paper "Wikinflection Corpus: A (Better) Multilingual, Morpheme-Annotated Inflectional Corpus" (Metheni…☆12Dec 15, 2023Updated 2 years ago
- Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!☆16Oct 30, 2024Updated last year
- Scraper for German democracy documents☆44Sep 12, 2023Updated 2 years ago
- ☆11Apr 16, 2025Updated 11 months ago
- A simple 404 page that uses the pathname as input to generate a 404 message.☆13Apr 28, 2018Updated 7 years ago
- Search and Proxy for Google web fonts☆16Sep 28, 2024Updated last year
- A browser extension providing Open Access bibliographical services☆18Dec 9, 2022Updated 3 years ago
- Les réflexions menées au cours du 404CTF 2023 pour résoudre les challenges proposés☆10Dec 16, 2023Updated 2 years ago
- Utility to compile string of chemical terms into data structure with chemical formula and composition☆13Sep 17, 2021Updated 4 years ago
- A default backend (404 page) for nginx-ingress in Kubernetes☆13Jan 23, 2018Updated 8 years ago
- Nim and awk based bot for Wikipedia☆12Feb 28, 2020Updated 6 years ago
- The Zonemaster Backend - part of the Zonemaster project☆16Dec 19, 2025Updated 3 months ago
- ☆17Jul 17, 2025Updated 8 months ago
- ☆12Dec 11, 2022Updated 3 years ago
- Caliper is a project for managing units of measure and the conversions between them.☆16Feb 17, 2026Updated last month
- CVE-2021-40438 exploit PoC with Docker setup.☆12Oct 24, 2021Updated 4 years ago
- ☆16Sep 9, 2021Updated 4 years ago
- 🚨 slog: Kafka handler☆12Feb 2, 2026Updated last month
- Sublime Text API Version Documenter☆11Jan 3, 2023Updated 3 years ago
- Transform any binary file to a PNG image☆13Jul 19, 2019Updated 6 years ago