tonywangcn / distributed-web-crawler
The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler
☆109Updated 2 months ago
Alternatives and similar repositories for distributed-web-crawler:
Users that are interested in distributed-web-crawler are comparing it to the libraries listed below
- Spider ported to Python☆66Updated 3 weeks ago
- 27.6% of the Top 10 Million Sites are Dead☆104Updated 3 months ago
- Golang Crawling and scraping framework☆103Updated last week
- Airbnb scraper made in Go☆34Updated 9 months ago
- Golinkedin is a library written in pure golang for scraping Linkedin☆41Updated 10 months ago
- [deprecated] AI Gateway - core infrastructure stack for building production-ready AI Applications☆157Updated 10 months ago
- Get structured JSON data from any page.☆175Updated last year
- Execute agentic workflows defined in simple YAML files☆112Updated this week
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.☆62Updated 4 months ago
- Improve technical documentation with the power of AI.☆25Updated last week
- Chew is a Go library for processing various content types into markdown/plaintext.☆40Updated this week
- New way for collect information from the API's/Websites☆121Updated 2 months ago
- Chatroom app where messages are sent to GPT, Claude, Mistral, Together, Grok, Groq, vLLM, Ollama & streamed to the frontend.☆39Updated this week
- Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications☆87Updated 4 months ago
- S3 vector database for LLM Agents and RAG.☆35Updated last year
- structured outputs for llms☆129Updated 5 months ago
- GuardRail: Advanced tool for data analysis and AI content generation using OpenAI GPT models. Features sentiment analysis, content classi…☆127Updated last year
- Transactional email magic link and One-Time Password (OTP) authentication platform. Sign up, log in, password resets, email verification,…☆30Updated last month
- Self-hosted version of Microsoft's OmniParser Image-to-text model☆40Updated 2 months ago
- Agency: Robust LLM Agent Management with Go☆65Updated 10 months ago
- Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!☆22Updated 2 months ago
- GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.☆94Updated this week
- rotating open proxy multiplexer☆175Updated 2 months ago
- Data Encoding and Representation Analysis☆40Updated last year
- The BaseMind.AI monorepo☆22Updated last week
- A TUI for Managing and Searching with Meilisearch☆16Updated this week
- Extract structured data from any unstructured web page☆41Updated 10 months ago
- CLI to verify an if an email address is deliverable. Uses SMTP to validate email addresses without sending an email.☆19Updated 2 months ago
- Vector Embedding Server in under 100 lines of code☆22Updated 11 months ago
- A simple Python-based tool designed to scrape websites for content.☆52Updated 4 months ago