tonywangcn / distributed-web-crawlerLinks
The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler
☆120Updated 7 months ago
Alternatives and similar repositories for distributed-web-crawler
Users that are interested in distributed-web-crawler are comparing it to the libraries listed below
Sorting:
- 27.6% of the Top 10 Million Sites are Dead☆110Updated 9 months ago
- Golang Crawling and scraping framework☆132Updated last week
- Airbnb scraper made in Go☆36Updated 3 weeks ago
- GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.☆97Updated 2 weeks ago
- Get structured JSON data from any page.☆177Updated last year
- New way for collect information from the API's/Websites☆121Updated 3 months ago
- CLI utility to scrape emails from websites☆167Updated last year
- Spider ported to Python☆87Updated 6 months ago
- A powerful starter template for building undetectable web scrapers and browser automation bots.☆54Updated 3 months ago
- Agency: Robust LLM Agent Management with Go☆66Updated last year
- go-trafilatura is a Go port of the trafilatura Python library.☆91Updated 2 months ago
- [deprecated] AI Gateway - core infrastructure stack for building production-ready AI Applications☆160Updated last year
- Chew is a Go library for processing various content types into markdown/plaintext.☆42Updated 5 months ago
- Golinkedin is a library written in pure golang for scraping Linkedin☆43Updated last year
- Amazon crawler made in Go☆40Updated 4 months ago
- Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications☆101Updated 10 months ago
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆70Updated 9 months ago
- Get your Unit tests written with every PR☆33Updated 4 months ago
- The Fastest LLM Gateway with built in OTel observability and MCP gateway☆285Updated this week
- Use AWS Lambda functions as a proxy pool to scrape web pages.☆135Updated last year
- rotating open proxy multiplexer☆183Updated 3 weeks ago
- A TUI for Managing and Searching with Meilisearch☆17Updated last week
- Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extr…☆206Updated 3 weeks ago
- Fast, lightweight metadata scraper for URLs. Written in Go.☆26Updated 2 months ago
- Staff fetcher library for LinkedIn - obtain experiences, schools, skills & contact info☆171Updated last month
- structured outputs for llms☆164Updated last month
- JotBot generates the missing code documentation for your Go and TypeScript projects. Powered by AI.☆36Updated 11 months ago
- Conveyor CI is an extensible Software Framework/Engine for building CI/CD Platforms.☆43Updated this week
- A simple ChatGPT clone built using Go☆38Updated 2 years ago
- Durable execution in Go with the Golang Inngest SDK. Write durable functions in your existing app.☆59Updated this week