bitmakerla / estela
estela, an elastic web scraping cluster πΈ
β180Updated 2 months ago
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- Scrapy rotation proxy package with advanced functionsβ95Updated 2 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β276Updated last year
- Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.β154Updated this week
- Page Object pattern for Scrapyβ121Updated this week
- The Web Scraping Club Free Repositoryβ141Updated 2 weeks ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β36Updated 9 months ago
- Minimal set of tools to conduct stealthy scraping.β156Updated 2 years ago
- β74Updated 3 months ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β430Updated 2 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.β137Updated 2 years ago
- π Intelligent browser header & fingerprint generatorβ538Updated last month
- Library that helps use puppeteer in scrapy.β52Updated last month
- β131Updated last year
- Home of the Ulixee Open Data Platformβ50Updated 5 months ago
- playwright stealthβ675Updated 9 months ago
- A blazing-fast Python HTTP Client with TLS fingerprintβ379Updated this week
- Web scraping Page Objects core libraryβ99Updated 3 months ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β94Updated 2 years ago
- Patching CDP (Chrome DevTools Protocol) leaks on OS level. Easy to use with Playwright, Selenium, and other web automation tools.β116Updated 8 months ago
- Browser fingerprint data generatorβ57Updated last month
- Scrapyd on container infrastructureβ14Updated last month
- A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.β303Updated 5 months ago
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ137Updated 4 months ago
- A python based HTML to text conversion library, command line client and Web service.β303Updated last month
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 3 years ago
- The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawlerβ115Updated 5 months ago
- A suite of tools for protecting the web's open knowledge.β127Updated 7 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β131Updated last year
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- Get structured JSON data from any page.β175Updated last year