teticio / lambda-scraper
Use AWS Lambda functions as a proxy pool to scrape web pages.
β112Updated 10 months ago
Related projects β
Alternatives and complementary repositories for lambda-scraper
- Minimal set of tools to conduct stealthy scraping.β150Updated last year
- estela, an elastic web scraping cluster πΈβ173Updated last week
- The Web Scraping Club Free Repositoryβ128Updated 2 weeks ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β239Updated 10 months ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β415Updated last year
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ126Updated 3 weeks ago
- Library that helps use puppeteer in scrapy.β52Updated this week
- Web scraping API for building AI applications.β40Updated 9 months ago
- Open source SaaS metrics (built by Paper)β156Updated 3 years ago
- Shopify Scraper package to extract all products from a Shopify site and return them in a Pandas dataframe.β29Updated last year
- This script can help you to submit URLs in bulk to the Google Indexing API.β34Updated last year
- A simple LinkedIn profile scraper implemented as a chrome extensionβ74Updated last year
- Cloud crawler functions for scrapeulousβ44Updated 3 years ago
- This repository provides usage examples for the Python module Newspaper3k.β142Updated 10 months ago
- Get structured JSON data from any page.β175Updated last year
- A python package for finding e-mails, checking deliverability and more.β48Updated 6 months ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β69Updated 3 years ago
- Serverless selenium which dynamically execute any given code.β56Updated 2 years ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supportβ¦β107Updated last year
- G2 Scraper helps you collect G2 product data, including names, product descriptions, reviews, ratings, comparisons, alternatives, and morβ¦β37Updated 3 months ago
- A basic python 3 based web scraper for extracting reviews from Amazon. Built using Selectorlib and requestsβ61Updated 8 months ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β90Updated 2 years ago
- Script that takes any long form video or podcast and outputs clips for social mediaβ103Updated 10 months ago
- Fully working applications that demonstrate how to use Haystack to implement common NLP use casesβ107Updated last week
- create your rotating proxy server with docker. self hosted rotating proxy service.β172Updated last year
- Zyte Automatic Extraction integration for Scrapyβ55Updated 2 years ago
- Python SEO keywords suggestion tool. Google Autocomplete, People Also Ask and Related Searches.β109Updated last year
- β72Updated this week