bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β191Updated last week
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- The Web Scraping Club Free Repositoryβ153Updated 2 weeks ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β297Updated 6 months ago
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β235Updated last year
- Scrapy rotation proxy package with advanced functionsβ95Updated 3 years ago
- β77Updated 4 months ago
- Home of the Ulixee Open Data Platformβ56Updated 2 months ago
- β143Updated 2 years ago
- Library that helps use puppeteer in scrapy.β52Updated 3 months ago
- Page Object pattern for Scrapyβ124Updated last month
- Web scraping Page Objects core libraryβ102Updated 3 weeks ago
- π Web scraping for humansβ967Updated 11 months ago
- Minimal set of tools to conduct stealthy scraping.β161Updated 2 years ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β139Updated last year
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β436Updated 2 years ago
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ150Updated 3 weeks ago
- A python based HTML to text conversion library, command line client and Web service.β325Updated this week
- Scrapy Extension for monitoring spiders execution.β548Updated 7 months ago
- π·οΈ Scrapyd is an application for deploying and running Scrapy spiders.β87Updated 2 months ago
- playwright stealthβ832Updated last year
- Get structured JSON data from any page.β178Updated 2 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 4 years ago
- Scrapy project boilerplate done rightβ48Updated 9 months ago
- A drop-in replacement for playwright patched with rebrowser-patches. It allows to pass modern automation detection tests.β36Updated 6 months ago
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- Spider templates for automatic crawlers.β32Updated last month
- dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decoratorsβ428Updated 8 months ago
- π Intelligent browser header & fingerprint generatorβ831Updated 8 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β39Updated last year
- Detect and classify pagination linksβ103Updated last month
- Patching CDP (Chrome DevTools Protocol) leaks on OS level. Easy to use with Playwright, Selenium, and other web automation tools.β148Updated last month