omar-elmaria / python_scrapy_airflow_pipelineLinks
This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically
☆14Updated 3 years ago
Alternatives and similar repositories for python_scrapy_airflow_pipeline
Users that are interested in python_scrapy_airflow_pipeline are comparing it to the libraries listed below
Sorting:
- Cookiecutter template to build and deploy fastapi backends..batteries included☆170Updated 6 months ago
- Scrapy project boilerplate done right☆48Updated last year
- Parsing JavaScript objects into Python data structures☆217Updated 6 months ago
- Zyte API integration for Scrapy☆40Updated this week
- Scrapfly Python SDK for headless browsers and proxy rotation☆50Updated last month
- Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, & NoSQL with Cassandra via AstraDB.☆149Updated 4 years ago
- Run a Scrapy spider programmatically from a script or a Celery task - no project required.☆121Updated last year
- Showcase of MongoDB integration with Python FastAPI framework supported by Pydantic as API backend called FARM☆47Updated last year
- Real-Time monitoring tool for Celery☆90Updated this week
- Spider templates for automatic crawlers.☆34Updated last month
- Simple, robust email validation☆133Updated 3 years ago
- FastApi module to use cloud storages☆25Updated 4 years ago
- Asynchronous alternative to the requests-ip-rotator library☆45Updated last year
- Django starter kit that focuses on good defaults, developer experience, and deployment. Updated for Django 5.2.☆235Updated 5 months ago
- Shortify is a URL shortener RESTful API built with Python and FastAPI ⚡☆136Updated last week
- Page Object pattern for Scrapy☆126Updated 2 weeks ago
- Celery Tasks Monitoring Tool☆199Updated 2 months ago
- Repository Patterns for Python☆178Updated 2 years ago
- FastAPI based feature rich backend for SaaS products and creating user dashboards.☆64Updated last year
- SendinBlue's python library for API v3☆83Updated 2 years ago
- Full TypeScript Next.js authentication using NextAuth.js in the frontend and Django Rest API as backend using JWT via username and passwo…☆50Updated 2 years ago
- high-level file-system operations for lazy devs.☆258Updated last week
- Django webhooks triggered on model changes☆224Updated last year
- ☆23Updated last week
- Simple project to test Elasticsearch with Django, build on docker.☆10Updated 5 years ago
- Get help with your Wagtail content using AI superpowers.☆184Updated 3 months ago
- Celery extension which allows to orchestrate 100/1000/10000 tasks combined into a complex workflow☆103Updated 2 years ago
- Web scraping Page Objects core library☆104Updated 2 weeks ago
- Pick the most common user-agents on the Internet 👻☆172Updated 5 years ago
- A package to get list of user agents based on filters such as operating system, software name etc..☆103Updated last year