omar-elmaria / python_scrapy_airflow_pipelineLinks
This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically
☆14Updated 3 years ago
Alternatives and similar repositories for python_scrapy_airflow_pipeline
Users that are interested in python_scrapy_airflow_pipeline are comparing it to the libraries listed below
Sorting:
- Zyte API integration for Scrapy☆39Updated last week
- Run a Scrapy spider programmatically from a script or a Celery task - no project required.☆121Updated last year
- Parsing JavaScript objects into Python data structures☆217Updated 5 months ago
- The Web Scraping Club Free Repository☆158Updated 2 months ago
- Common interface for data container classes☆68Updated 3 weeks ago
- Asynchronous alternative to the requests-ip-rotator library☆45Updated last year
- Web scraping Page Objects core library☆104Updated this week
- Spider templates for automatic crawlers.☆34Updated 3 weeks ago
- Python clients for Zyte AutoExtract API☆41Updated 4 years ago
- Celery worker for running asyncio coroutine tasks☆59Updated 7 months ago
- Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL with Cassandra via AstraDB.☆148Updated 4 years ago
- ☆68Updated 3 months ago
- Simple, robust email validation☆133Updated 3 years ago
- Page Object pattern for Scrapy☆125Updated this week
- Shortify is a URL shortener RESTful API built with Python and FastAPI ⚡☆135Updated 3 weeks ago
- Cookiecutter template to build and deploy fastapi backends..batteries included☆170Updated 6 months ago
- Web grep: search all rendered resources used by a URI☆89Updated 2 months ago
- Celery Tasks Monitoring Tool☆199Updated last month
- A Scrapy middleware to bypass the CloudFlare's anti-bot protection☆111Updated 4 years ago
- Real-Time monitoring tool for Celery☆90Updated this week
- Scrapy project boilerplate done right☆48Updated 11 months ago
- Detect and classify pagination links☆105Updated last week
- Extract price amount and currency symbol from a raw text string☆347Updated 3 months ago
- ☆60Updated last year
- Scrapfly Python SDK for headless browsers and proxy rotation☆50Updated 3 weeks ago
- Helpful celery task queue extensions.☆31Updated last month
- Library that helps use puppeteer in scrapy.☆52Updated 5 months ago
- Repository Patterns for Python☆178Updated 2 years ago
- Minimal set of tools to conduct stealthy scraping.☆162Updated 2 years ago
- Software stack with latest Scrapy and updated deps☆65Updated 2 weeks ago