bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β194Updated last month
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β297Updated 7 months ago
- The Web Scraping Club Free Repositoryβ156Updated last month
- Scrapy rotation proxy package with advanced functionsβ95Updated 3 years ago
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β236Updated last year
- Home of the Ulixee Open Data Platformβ56Updated 3 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.β139Updated 2 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β435Updated 3 years ago
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ155Updated 2 weeks ago
- Page Object pattern for Scrapyβ125Updated 2 months ago
- Minimal set of tools to conduct stealthy scraping.β162Updated 2 years ago
- Web scraping Page Objects core libraryβ104Updated 2 weeks ago
- Spider templates for automatic crawlers.β33Updated 3 weeks ago
- Scrapy Extension for monitoring spiders execution.β552Updated 8 months ago
- A python based HTML to text conversion library, command line client and Web service.β331Updated last month
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- β78Updated 6 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β38Updated last week
- Library that helps use puppeteer in scrapy.β52Updated 5 months ago
- Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/β205Updated 7 months ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.β70Updated 4 years ago
- Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.β212Updated 4 months ago
- Detect and classify pagination linksβ105Updated 2 weeks ago
- π Web scraping for humansβ976Updated last year
- Parsing JavaScript objects into Python data structuresβ217Updated 5 months ago
- Extract price amount and currency symbol from a raw text stringβ346Updated 2 months ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β98Updated 3 years ago
- β143Updated 2 years ago
- A Scrapy middleware to bypass the CloudFlare's anti-bot protectionβ111Updated 4 years ago
- This is a proof-of-concept of using an LLM to find and extract meaningful data without parsing the html too much.β30Updated 2 years ago
- playwright stealthβ851Updated last year