yennanliu / AirflowJob
Airflow POC demo : 1) env set up 2) airflow DAG 3) Spark/ML pipeline | #DE
☆12Updated 2 years ago
Alternatives and similar repositories for AirflowJob:
Users that are interested in AirflowJob are comparing it to the libraries listed below
- event-triggered plugins for airflow☆21Updated 5 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆74Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- Sample Airflow DAGs☆62Updated 2 years ago
- A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0☆25Updated 3 years ago
- Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆28Updated last year
- ☆48Updated 3 years ago
- Use Airflow to move data from multiple MySQL databases to BigQuery☆99Updated 4 years ago
- Creates simple data models on Snowflake to report dbt source freshness and tests☆23Updated last year
- This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆75Updated 5 years ago
- Execution of DBT models using Apache Airflow through Docker Compose☆113Updated 2 years ago
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆82Updated 9 months ago
- New generation opensource data stack☆65Updated 2 years ago
- Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase☆124Updated 2 years ago
- REST-like API exposing Airflow data and operations☆61Updated 6 years ago
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆133Updated 4 years ago
- ☆26Updated 4 years ago
- ☆16Updated 7 years ago
- Multi-stage, config driven, SQL based ETL framework using PySpark☆25Updated 5 years ago
- SQL-based transforms compatible with Rasgo and PyRasgo☆24Updated 9 months ago
- Visualize dependencies between Airflow DAGs☆49Updated 3 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆106Updated this week
- Challenge for those applying to the Software Engineer, Big Data position☆34Updated 13 years ago
- Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage dat…☆16Updated 4 years ago
- Various data stream/batch process demo with Apache Scala Spark 🚀☆11Updated 4 years ago
- ☆73Updated this week
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago