icanbwell / SparkPipelineFrameworkLinks
Framework for simpler Spark Pipelines
☆11Updated this week
Alternatives and similar repositories for SparkPipelineFramework
Users that are interested in SparkPipelineFramework are comparing it to the libraries listed below
Sorting:
- AWS Quick Start Team☆19Updated last year
- ☆95Updated 2 years ago
- ☆72Updated last year
- Automated data quality suggestions and analysis with Deequ on AWS Glue☆90Updated 3 years ago
- A Python API for Asynchronously Loading Data into Snowflake DB -☆68Updated 3 months ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56Updated 2 years ago
- Amazon SageMaker Best Practices, published by Packt☆29Updated last week
- Airflow training for the crunch conf☆105Updated 7 years ago
- ☆24Updated 3 years ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Updated 3 years ago
- This repository shows a sample example to build, manage and orchestrate Machine Learning workflows using Amazon Sagemaker and Apache Airf…☆138Updated 4 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆183Updated 4 years ago
- This is the documentation for the Amazon Redshift Developer Guide☆121Updated 2 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆115Updated last week
- A Snowflake Sandbox for Data Science☆36Updated 4 years ago
- Snowflake Cookbook, published by Packt☆83Updated 3 years ago
- ☆54Updated 2 years ago
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- ☆29Updated 2 years ago
- Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormati…☆117Updated 2 months ago
- This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concu…☆77Updated 7 years ago
- Lab Instructions for Data Engineering Immersion Day☆196Updated 2 weeks ago
- Quickstart PySpark with Anaconda on AWS/EMR using Terraform☆48Updated last year
- Examples to get you started with Amazon Redshift ML☆20Updated 2 years ago
- Spark app to merge different schemas☆23Updated 5 years ago
- Example repo to create end to end tests for data pipeline.☆25Updated last year
- The open source version of the AWS Glue docs. You can submit feedback & requests for changes by submitting issues in this repo or by maki…☆201Updated 2 years ago
- ☆51Updated 3 years ago
- This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆52Updated 2 years ago
- Sample code with integration between Data Catalog and Hive data source.☆24Updated last year