avensolutions / spark-sql-etl-frameworkLinks
Multi-stage, config driven, SQL based ETL framework using PySpark
☆26Updated 6 years ago
Alternatives and similar repositories for spark-sql-etl-framework
Users that are interested in spark-sql-etl-framework are comparing it to the libraries listed below
Sorting:
- Snowflake Data Source for Apache Spark.☆230Updated this week
- A simple Spark-powered ETL framework that just works 🍺☆182Updated 2 months ago
- A simplified, lightweight ETL Framework based on Apache Spark☆586Updated last year
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆77Updated 4 years ago
- ☆63Updated 6 years ago
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆46Updated 10 months ago
- DataQuality for BigData☆144Updated last year
- Data validation library for PySpark 3.0.0☆33Updated 3 years ago
- Spline agent for Apache Spark☆200Updated this week
- Magic to help Spark pipelines upgrade☆34Updated last year
- Apache Spark ETL Utilities☆39Updated last year
- Delta Lake examples☆234Updated last year
- ☆16Updated 6 years ago
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆123Updated this week
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 6 years ago
- An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR☆175Updated 6 months ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆181Updated 2 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆102Updated 2 years ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 2 months ago
- Build configuration-driven ETL pipelines on Apache Spark☆161Updated 3 years ago
- Spark style guide☆266Updated last year
- A plugin to Apache Airflow to allow you to run Spark Submit Commands as an Operator☆73Updated 6 years ago
- Examples of Spark 3.0☆45Updated 5 years ago
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆64Updated 3 years ago
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆97Updated last week
- Rules based grant management for Snowflake☆41Updated 6 years ago
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆90Updated 2 years ago
- Sample Airflow DAGs☆64Updated 3 years ago
- Yet Another (Spark) ETL Framework☆21Updated 2 years ago