bitmakerla / estelaLinks
estela, an elastic web scraping cluster πΈ
β184Updated last month
Alternatives and similar repositories for estela
Users that are interested in estela are comparing it to the libraries listed below
Sorting:
- Scrapy rotation proxy package with advanced functionsβ95Updated 3 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β285Updated last month
- Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.β235Updated last year
- Use AWS Lambda functions as a proxy pool to scrape web pages.β134Updated last year
- Library that helps use puppeteer in scrapy.β52Updated last month
- Scrapy Extension for monitoring spiders execution.β545Updated 3 months ago
- Page Object pattern for Scrapyβ123Updated last week
- Web scraping Page Objects core libraryβ102Updated 2 weeks ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of theβ¦β37Updated 11 months ago
- A python based HTML to text conversion library, command line client and Web service.β312Updated last month
- Clean, filter and sample URLs to optimize data collection β Python & command-line β Deduplication, spam, content and language filtersβ142Updated 6 months ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β430Updated 2 years ago
- A complimentary proxy to help to use SPM with headless browsersβ108Updated 2 years ago
- Get structured JSON data from any page.β176Updated last year
- Scrapy project boilerplate done rightβ48Updated 5 months ago
- Fast and robust date extraction from web pages, with Python or on the command-lineβ133Updated 6 months ago
- Zyte Automatic Extraction integration for Scrapyβ56Updated 3 years ago
- dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decoratorsβ429Updated 3 months ago
- Make sense of it all. Semantic data modeling and analytics with a sprinkle of AI. https://totalhack.github.io/zillion/β201Updated last month
- Private ChatGPT/Perplexity. Securely unlocks knowledge from confidential business information.β67Updated 9 months ago
- The Architecture of a Web Crawler: Building a Google-Inspired Distributed Web Crawlerβ117Updated 7 months ago
- Scrapy + Puppeteerβ110Updated 4 years ago
- Article extraction benchmark: dataset and evaluation scriptsβ318Updated last year
- π·οΈ Scrapyd is an application for deploying and running Scrapy spiders.β85Updated 2 months ago
- Common crawl extractorβ77Updated last year
- πΆ Awesome list of Scrapy tools and librariesβ59Updated 5 years ago
- Spider ported to Pythonβ87Updated 5 months ago
- Python port of Boilerpipe libraryβ88Updated 10 months ago
- Python SDK for Inngest: Durable functions and workflows in Python, hosted anywhereβ103Updated this week
- hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.sixβ196Updated 7 months ago