zekeriyyaa / Apache-Spark-Structured-Streaming-Via-Docker-ComposeLinks
☆13Updated 2 years ago
Alternatives and similar repositories for Apache-Spark-Structured-Streaming-Via-Docker-Compose
Users that are interested in Apache-Spark-Structured-Streaming-Via-Docker-Compose are comparing it to the libraries listed below
Sorting:
- code snippet for analytics sessions☆34Updated 3 years ago
 - Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
 - ☆88Updated 3 years ago
 - This repo contains live examples to build Databricks' Lakehouse and recommended best practices from the field.☆22Updated last year
 - Course Material☆25Updated 2 years ago
 - Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.☆40Updated 2 years ago
 - Resources for video demonstrations and blog posts related to DataOps on AWS☆181Updated 3 years ago
 - PySpark functions and utilities with examples. Assists ETL process of data modeling☆104Updated 4 years ago
 - A Pyspark job to handle upserts, conversion to parquet and create partitions on S3☆28Updated 5 years ago
 - Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
 - Serverless ETL and Analytics with AWS Glue, published by Packt☆52Updated 2 years ago
 - Spark app to merge different schemas☆23Updated 4 years ago
 - Snowflake Cookbook, published by Packt☆81Updated 2 years ago
 - ☆49Updated 9 months ago
 - Snowflake Data Engineering in Action☆35Updated last year
 - Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 3 years ago
 - Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.☆38Updated 3 years ago
 - Productionalizing Data Pipelines with Apache Airflow☆115Updated 3 years ago
 - Python library for automating administration and data science in Strategy One environments☆96Updated 2 weeks ago
 - Data engineering with dbt, published by Packt☆87Updated 2 months ago
 - Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation,…☆90Updated 3 years ago
 - This repository contains ready-to-use notebook examples for a wide variety of use cases in Amazon EMR Studio.☆52Updated 2 years ago
 - Spark data pipeline that processes movie ratings data.☆30Updated last month
 - Amazon Managed Workflows for Apache Airflow (MWAA) Examples repository contains example DAGs, requirements.txt, plugins, and CloudFormati…☆116Updated 3 months ago
 - AWS Quick Start Team☆19Updated last year
 - EverythingApacheNiFi☆115Updated 2 years ago
 - Data Engineering with Spark and Delta Lake☆104Updated 2 years ago
 - This project helps me to understand the core concepts of Apache Airflow. I have created custom operators to perform tasks such as staging…☆93Updated 6 years ago
 - Built a stream processing data pipeline to get data from disparate systems into a dashboard using Kafka as an intermediary.☆29Updated 2 years ago
 - ☆26Updated 5 years ago