tonywangcn / distributed-web-crawler
The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler
☆113Updated 3 months ago
Alternatives and similar repositories for distributed-web-crawler:
Users that are interested in distributed-web-crawler are comparing it to the libraries listed below
- 27.6% of the Top 10 Million Sites are Dead☆105Updated 4 months ago
- Spider ported to Python☆71Updated 2 months ago
- Airbnb scraper made in Go☆34Updated 2 weeks ago
- Golinkedin is a library written in pure golang for scraping Linkedin☆41Updated 11 months ago
- Golang Crawling and scraping framework☆108Updated last month
- Amazon crawler made in Go☆39Updated 2 weeks ago
- Shopify Scraper package to extract all products from a Shopify site and return them in a Pandas dataframe.☆30Updated last year
- 🚀 OFFICIAL STARTER TEMPLATE FOR BOTASAURUS SCRAPING FRAMEWORK 🤖☆23Updated last month
- GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.☆95Updated 2 weeks ago
- TypeScript library for Google search scraping using http requests with proxy support, pagination, and regional customization. Built for w…☆29Updated last month
- Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications☆91Updated 5 months ago
- structured outputs for llms☆137Updated 7 months ago
- Agency: Robust LLM Agent Management with Go☆66Updated last year
- A TUI for Managing and Searching with Meilisearch☆16Updated this week
- Improve technical documentation with the power of AI.☆30Updated 3 weeks ago
- A distributed in-memory, durable key value database designed for massive amounts of critical data and low latency.☆57Updated last month
- Curated list of everything related to captchas, including libraries, solvers and scoring☆26Updated 8 months ago
- CLI to verify an if an email address is deliverable. Uses SMTP to validate email addresses without sending an email.☆20Updated 2 weeks ago
- The Web Scraping Club Free Repository☆137Updated 4 months ago
- Data Encoding and Representation Analysis☆40Updated last year
- Get structured JSON data from any page.☆175Updated last year
- Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!☆25Updated 4 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆62Updated 5 months ago
- CLI utility to scrape emails from websites☆159Updated last year
- Chew is a Go library for processing various content types into markdown/plaintext.☆41Updated last month
- Backup PostgreSQL to MinIO☆16Updated last month
- A simple ChatGPT clone built using Go☆36Updated last year
- Automate Pull Request Reviews with AI 🪄☆59Updated 2 weeks ago
- Transactional email magic link and One-Time Password (OTP) authentication platform. Sign up, log in, password resets, email verification,…☆30Updated 2 months ago
- Progzee is a Python library for simplifying IP proxy usage in HTTP requests.☆16Updated last month