DeepHiveMind / Modern_E2E_ServerlessDataPipeline_NextGenDataLake
Next Gen Serverless E2E Data Pipeline & Workflow Orchestration | Modern Data Lake |
☆10Updated 4 years ago
Alternatives and similar repositories for Modern_E2E_ServerlessDataPipeline_NextGenDataLake:
Users that are interested in Modern_E2E_ServerlessDataPipeline_NextGenDataLake are comparing it to the libraries listed below
- Welcome to the wonderland of "AI" = f(DL, RL, DRL, ML, NLP, KG, MLOPS)☆23Updated 2 years ago
- Real-World AI/ML Ecosystem | Enterprise AI Platform Recipe | Custom MLOPS☆20Updated 2 years ago
- Implementing best practices for PySpark ETL jobs and applications.☆1,886Updated 2 years ago
- Price Crawler - Tracking Price Inflation☆185Updated 4 years ago
- StreamSoft enables real-time analysis of any stock market☆13Updated 11 months ago
- A place to learn and explore PySpark Streaming, PySpark Structured Streaming with Hands-On. Lets get started ...☆17Updated 4 years ago
- ☆9Updated 6 years ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆101Updated 4 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆142Updated 4 years ago
- This project aims to move the data from a Relational database system (RDBMS) to a Hadoop file system (HDFS)☆10Updated 2 years ago
- Distributed Data Mesh 2.0 | DataMesh-as-a-Code on Cloud | Theory to Industrialization☆36Updated 2 years ago
- ☆148Updated 7 years ago
- My documents for self-learning fundamental of Data engineering skills☆12Updated last year
- Resources and projects from Udacity Data Engineering with AWS nano degree programme☆25Updated 2 years ago
- Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time☆70Updated 8 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆82Updated 5 years ago
- Docker with Airflow and Spark standalone cluster☆254Updated last year
- A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from loc…☆22Updated 2 years ago
- Guide for databricks spark certification☆58Updated 3 years ago
- Solution for IBM Data Engineering Professional Certificate☆25Updated 5 months ago
- ☆50Updated last year
- Fundamentals of Spark with Python (using PySpark), code examples☆344Updated 2 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- ETL pipeline using pyspark (Spark - Python)☆113Updated 5 years ago
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆29Updated last year
- 4 different Big Datasets joined to get single table for final data analysis. Fraud Detection by taken consideration of different key feat…☆46Updated 4 years ago
- Cloudera CCA175 Spark and Hadoop Developer exam preparation☆16Updated 7 years ago
- ☆53Updated 4 years ago
- Pyspark RDD, DataFrame and Dataset Examples in Python language☆1,245Updated last year
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated last year