teticio / lambda-scraper
Use AWS Lambda functions as a proxy pool to scrape web pages.
☆107Updated 8 months ago
Related projects: ⓘ
- The Web Scraping Club Free Repository☆116Updated last month
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆216Updated 8 months ago
- Minimal set of tools to conduct stealthy scraping.☆144Updated last year
- Library that helps use puppeteer in scrapy.☆51Updated this week
- estela, an elastic web scraping cluster 🕸☆167Updated 2 months ago
- ☆30Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆113Updated 2 weeks ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆411Updated last year
- A python package for finding e-mails, checking deliverability and more.☆34Updated 4 months ago
- A simple LinkedIn profile scraper implemented as a chrome extension☆73Updated 11 months ago
- Scrapy rotation proxy package with advanced functions☆92Updated 2 years ago
- The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler☆84Updated 4 months ago
- 🕷️ Scrapyd is an application for deploying and running Scrapy spiders.☆77Updated 3 weeks ago
- Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with featur…☆22Updated 3 months ago
- Common crawl extractor☆67Updated 3 months ago
- S3 vector database for LLM Agents and RAG.☆28Updated last year
- Index Common Crawl archives in tabular format☆105Updated last week
- undetected chromedriver Docker☆27Updated last year
- Get structured JSON data from any page.☆169Updated 11 months ago
- Staff scraper library for LinkedIn - obtain experiences, schools, skills & more☆52Updated this week
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.☆227Updated 3 months ago
- Zyte Automatic Extraction integration for Scrapy☆55Updated 2 years ago
- AI based web-wrapper for web-content-extraction☆99Updated last year
- This repository provides usage examples for the Python module Newspaper3k.☆138Updated 8 months ago
- Scrapfly Python SDK for headless browsers and proxy rotation☆30Updated last week
- Redis Queue Dashboard based on FastAPI☆81Updated last month
- Google Search SERP Scraper☆101Updated last year
- Super Fast, Super Anti-Detect, and Super Intuitive Web Driver☆40Updated last week
- Introducing the AmazonMe webscraper - a powerful tool for extracting data from Amazon.com using the Requests and Beautifulsoup library in…☆55Updated 5 months ago
- Python wrapper for google people-alos-ask☆99Updated 2 weeks ago