teticio / lambda-scraperLinks
Use AWS Lambda functions as a proxy pool to scrape web pages.
☆135Updated last year
Alternatives and similar repositories for lambda-scraper
Users that are interested in lambda-scraper are comparing it to the libraries listed below
Sorting:
- The Web Scraping Club Free Repository☆147Updated 2 months ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆288Updated 2 months ago
- estela, an elastic web scraping cluster 🕸☆185Updated last week
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.☆235Updated last year
- Minimal set of tools to conduct stealthy scraping.☆159Updated 2 years ago
- Get structured JSON data from any page.☆177Updated last year
- This is the ultimate web scraping tool for extracting the most relevant data points from products on Walmart.com! this powerful scraper i…☆15Updated 2 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆431Updated 2 years ago
- AI article writer to automatically generate articles with 1,500-7,000+ words to boost your website's SEO and make it more alive☆25Updated last year
- Staff fetcher library for LinkedIn - obtain experiences, schools, skills & contact info☆171Updated last month
- This bot mass DMs Reddit users(from a list) a specified message.☆57Updated 3 months ago
- G2 Scraper helps you collect G2 product data, including names, product descriptions, reviews, ratings, comparisons, alternatives, and mor…☆48Updated 6 months ago
- Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.☆268Updated 3 months ago
- A python package for finding e-mails, checking deliverability and more.☆70Updated last year
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.☆29Updated 2 years ago
- Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular…☆72Updated last week
- A library to extract the main content from html. Developed for information on LLM and for feeding data into LangChain and LlamaIndex.☆45Updated last year
- A fork of https://github.com/AtuboDad/playwright_stealth☆123Updated last month
- The GPT-based Universal Web Scraper MVP is a solution that leverages GPT models and web scraping libraries to generate scraper code based…☆266Updated last year
- Browser automation engine benchmark - Test bypass rates, performance & stealth against Cloudflare, DataDome, reCAPTCHA and other bot dete…☆59Updated last week
- The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawler☆120Updated 7 months ago
- Base Docker images for Apify actors.☆82Updated this week
- Unflare helps you to bypass Cloudflare protection☆135Updated 2 weeks ago
- Detects the presence of anti-bot and fingerprinting technologies on websites by analyzing requests, headers, cookies, and more. Built on …☆48Updated 9 months ago
- Undetected web-scraping & seamless HTML parsing in Python!☆276Updated 3 weeks ago
- Self-hosted version of Microsoft's OmniParser Image-to-text model☆71Updated 2 months ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Free IP Proxy rotator library for python☆263Updated last week
- Automate the world of LinkedIn!☆104Updated 4 months ago
- playwright stealth☆745Updated last year