internetarchive/umbra

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/internetarchive/umbra)

internetarchive / umbra

A queue-controlled browser automation tool for improving web crawl quality

☆68

Alternatives and similar repositories for umbra

Users that are interested in umbra are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nla / outbackcdx
View on GitHub
Web archive index server based on RocksDB
☆43Jul 9, 2026Updated 2 weeks ago
iipc / twittervane
View on GitHub
Using social media to steer web archiving and curation.
☆18Nov 20, 2015Updated 10 years ago
vinaygoel / archive-analysis
View on GitHub
Tools to analyze web archives
☆20Jul 12, 2016Updated 10 years ago
internetarchive / trough
View on GitHub
Trough: Big data, small databases.
☆43Jul 25, 2024Updated last year
oduwsdl / ORS
View on GitHub
Object Resource Stream and CDXJ Drafts
☆15Nov 28, 2018Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
webrecorder / cdxj-indexer
View on GitHub
CDXJ Indexing of WARC/ARCs
☆35May 11, 2026Updated 2 months ago
helgeho / ArchiveSpark
View on GitHub
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…
☆161Oct 8, 2025Updated 9 months ago
maturban / WARCMerge
View on GitHub
Merging WARCs into a single WARC file
☆15Aug 29, 2014Updated 11 years ago
ukwa / ukwa-pywb
View on GitHub
☆11Nov 21, 2025Updated 8 months ago
CobwebOrg / cobweb
View on GitHub
Collaborative collection development for web archives
☆19Sep 5, 2019Updated 6 years ago
internetarchive / webarchive-commons
View on GitHub
☆15Sep 8, 2016Updated 9 years ago
PromyLOPh / crocoite
View on GitHub
Web archiving using Google Chrome
☆45Dec 30, 2019Updated 6 years ago
anjackson / sliver
View on GitHub
A tool for collection archival slivers of the web and web archives
☆19Jun 1, 2026Updated last month
WASAPI-Community / data-transfer-apis
View on GitHub
WASAPI data transfer APIs
☆50Apr 23, 2022Updated 4 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
internetarchive / brozzler
View on GitHub
brozzler - distributed browser-based web crawler
☆809Jul 7, 2026Updated 2 weeks ago
internetarchive / ia-hadoop-tools
View on GitHub
☆23Feb 22, 2024Updated 2 years ago
ikreymer / pywb-webrecorder
View on GitHub
Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io
☆38Oct 16, 2015Updated 10 years ago
glenrobson / iiif_stuff
View on GitHub
IIIF Examples and useful code
☆20Sep 10, 2025Updated 10 months ago
thatandromeda / hamlet
View on GitHub
How About Machine Learning Enhancing Theses? - a pilot discovery project
☆14May 23, 2023Updated 3 years ago
gwu-libraries / social-feed-manager
View on GitHub
"Old SFM" -- manage rules and streams from social data sources, starting with twitter.
☆86Aug 10, 2023Updated 2 years ago
jellever / DominantColor
View on GitHub
Basic implementation for calculating the dominant color in an image.
☆12Jan 1, 2016Updated 10 years ago
iipc / warc-specifications
View on GitHub
Centralised repository for WARC usage specifications.
☆129Apr 4, 2026Updated 3 months ago
ikreymer / webarchive-indexing
View on GitHub
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
☆46Dec 4, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rdf2h / rdf2h
View on GitHub
Render resources described in RDF using logicless templates.
☆15Dec 30, 2022Updated 3 years ago
vinaygoel / ars-workshop
View on GitHub
Archive Research Services Workshop
☆31Sep 29, 2017Updated 8 years ago
natliblux / warc-safe
View on GitHub
A tool for detecting viruses and NSFW material in WARC files
☆18Jul 15, 2026Updated last week
webrecorder / pywb-remote-browsers
View on GitHub
Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives
☆16Jun 10, 2021Updated 5 years ago
cisocrgroup / Resources
View on GitHub
Manuals, lexica, OCR test data for PoCoTo and the profiler
☆15Jul 2, 2021Updated 5 years ago
peterk / munin-indexer
View on GitHub
A social media open post web archiving tool
☆26Feb 4, 2026Updated 5 months ago
iipc / robustlinks
View on GitHub
Links on the web break all the time, robustify them!
☆61Mar 5, 2026Updated 4 months ago
web-archive-group / heritrix-walkthrough
View on GitHub
☆10Jun 10, 2016Updated 10 years ago
DocNow / waybackprov
View on GitHub
utility to fetch provenance information from Internet Archive's Wayback Machine
☆15Feb 5, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lintool / warcbase
View on GitHub
Warcbase is an open-source platform for managing analyzing web archives
☆162Dec 8, 2017Updated 8 years ago
albertmeronyo / pyldn
View on GitHub
A pythonic Linked Data Notifications (LDN) receiver
☆14Jan 31, 2019Updated 7 years ago
spark4lib / code4lib2018
View on GitHub
☆14Feb 13, 2018Updated 8 years ago
ianmilligan1 / Historian-WARC-1
View on GitHub
The Historian's WARC Toolkit
☆16May 14, 2015Updated 11 years ago
iipc / warc2html
View on GitHub
Converts WARC files to static HTML
☆59Sep 18, 2025Updated 10 months ago
peterk / warcworker
View on GitHub
A dockerized, queued high fidelity web archiver based on Squidwarc
☆62Jul 9, 2024Updated 2 years ago
RubenVerborgh / Refine-NER-Extension
View on GitHub
Named-Entity Recognition extension for Google Refine / OpenRefine
☆74Jun 21, 2017Updated 9 years ago