The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler
☆126Dec 11, 2024Updated last year
Alternatives and similar repositories for distributed-web-crawler
Users that are interested in distributed-web-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The Manta v1 software architecture for Autonomous Underwater Vehicles (AUVs) - Master's thesis☆10Aug 11, 2022Updated 3 years ago
- Proxied asynchronous multi-threaded web scraper via concurrent queues written in Java.☆17Nov 25, 2023Updated 2 years ago
- A Rust client library for Airsim.☆20Sep 18, 2023Updated 2 years ago
- Scrapyd on container infrastructure☆16Apr 11, 2025Updated 11 months ago
- HTTP proxy with per-request uTLS fingerprint mimicry and upstream proxy tunneling. Currently WIP.☆50Jan 14, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- (educational) build your own disk based KV store☆13Jul 27, 2024Updated last year
- Patching CDP (Chrome DevTools Protocol) leaks on OS level. Easy to use with Playwright, Selenium, and other web automation tools.☆158Sep 28, 2025Updated 6 months ago
- ☆12Apr 16, 2025Updated 11 months ago
- A small library for building fast and highly customizable web crawlers☆16Jan 4, 2023Updated 3 years ago
- An example backend with GoLang that uses auth0 for authentication☆18Jan 20, 2023Updated 3 years ago
- Library for creating genric data pipelines and streams☆11Dec 18, 2023Updated 2 years ago
- An extension of the UUV-Simulator for use with Vortex NTNUs autonomous vessels☆35May 17, 2023Updated 2 years ago
- Go SDK for working with Cerbos☆15Mar 19, 2026Updated last week
- A Collection of 10.000 collected Windows Chrome Fingerprints. Usable with an easy-to-use API, available as a compressed (lzma) or full-si…☆267Dec 22, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- small MCP server for orchestrating tasks across LLM instances☆24Apr 29, 2025Updated 11 months ago
- A URL shortener written in Go, with a Mongo based backend, Prometheus and Grafana based monitoring, Memcached based write-through caching…☆39Jun 11, 2021Updated 4 years ago
- 27.6% of the Top 10 Million Sites are Dead☆117Nov 4, 2024Updated last year
- 🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser en…☆19Jul 11, 2025Updated 8 months ago
- 🚗 Real time package tracking implementation with RabbitMQ☆60Jul 13, 2022Updated 3 years ago
- 🔮 Vindicate non-organic web traffic via MITM proxy☆81Jul 15, 2024Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆56Feb 10, 2026Updated last month
- A C++ implementation of quadtree☆17Jun 16, 2016Updated 9 years ago
- A library for better integration between django and the WSGI world.☆50Jan 7, 2011Updated 15 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆32Oct 30, 2025Updated 5 months ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆27May 12, 2024Updated last year
- ☆70Nov 17, 2023Updated 2 years ago
- DIY home security project using Honeywell 5800 series RF sensors☆13Feb 12, 2020Updated 6 years ago
- A word game in the vein of Wordle; try to solve back-to-back code words to get to 100 points.☆27Feb 18, 2026Updated last month
- ☆24Mar 16, 2019Updated 7 years ago
- ☆20Jan 23, 2024Updated 2 years ago
- Download, parse, and filter data from Court Listener, part of the FreeLaw projects. Data-ready for The-Pile.☆15Jun 3, 2023Updated 2 years ago
- TikTok in Go☆14Mar 11, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DevOps pipeline for Real Time Social/Web Mining☆26Mar 8, 2026Updated 3 weeks ago
- logging for django.☆35Jan 16, 2011Updated 15 years ago
- Go version updater.☆22Jun 20, 2025Updated 9 months ago
- cf_clearance, rack.session, laravel_session, _bm etc... cloudflare cookies generator via headless browser☆33Feb 4, 2023Updated 3 years ago
- hopfield☆30Oct 8, 2021Updated 4 years ago
- Fetch data from HTML and XML via xpath/css and prepare it with regexp☆34Feb 1, 2026Updated last month
- Search the web with advanced filters and LLM-friendly output formats!☆54Sep 15, 2025Updated 6 months ago