bitmakerla / estela
estela, an elastic web scraping cluster πΈ
β172Updated last week
Related projects β
Alternatives and complementary repositories for estela
- Scrapy rotation proxy package with advanced functionsβ93Updated 2 years ago
- Page Object pattern for Scrapyβ119Updated this week
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ124Updated last week
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β234Updated 10 months ago
- Comprehensive wrapper and execution manager for the Chrome browser using the Chrome Debugging Protocol.β218Updated last year
- Zyte Automatic Extraction integration for Scrapyβ55Updated 2 years ago
- Web scraping Page Objects core libraryβ95Updated 3 weeks ago
- Common interface for data container classesβ62Updated 3 weeks ago
- πΆ Awesome list of Scrapy tools and librariesβ55Updated 4 years ago
- Scrapy Extension for monitoring spiders execution.β533Updated last week
- Home of the Ulixee Open Data Platformβ47Updated this week
- The Web Scraping Club Free Repositoryβ127Updated last week
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β37Updated 3 months ago
- Library that helps use puppeteer in scrapy.β51Updated last week
- Pyppeteer integration for Scrapyβ60Updated 3 years ago
- Minimal set of tools to conduct stealthy scraping.β150Updated last year
- π·οΈ Scrapyd is an application for deploying and running Scrapy spiders.β79Updated 3 weeks ago
- β64Updated 7 months ago
- Scrapy project boilerplate done rightβ43Updated last month
- A python package for finding e-mails, checking deliverability and more.β46Updated 6 months ago
- Lightning-Fast, Adaptive Web Scraping for Pythonβ128Updated this week
- π Intelligent browser header & fingerprint generatorβ228Updated 5 months ago
- Scrapy + Puppeteerβ111Updated 3 years ago
- β122Updated last year
- Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.β106Updated last month
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β69Updated 3 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β415Updated last year
- π¦ Anti-detect browserβ163Updated this week
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β89Updated last year
- A Scrapy middleware to bypass the CloudFlare's anti-bot protectionβ106Updated 3 years ago