rss161030 / ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.
☆11Updated 7 years ago
Alternatives and similar repositories for ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala:
Users that are interested in ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala are comparing it to the libraries listed below
- ETL pipeline using pyspark (Spark - Python)☆114Updated 5 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Updated 6 years ago
- Spark Examples☆125Updated 3 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated 2 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆40Updated 5 years ago
- Spark Structured Streaming / Kafka / Cassandra / Elastic☆183Updated 2 years ago
- PySpark-ETL☆23Updated 5 years ago
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆33Updated 4 years ago
- Apache Spark 3 - Structured Streaming Course Material☆122Updated last year
- Apache Spark 3 - Structured Streaming Course Material☆45Updated 4 years ago
- Repository used for Spark Trainings☆53Updated 2 years ago
- Twitter Sentiment Analysis using Spark and Kafka☆115Updated 6 years ago
- Docker with Airflow and Spark standalone cluster☆256Updated last year
- ☆11Updated 6 years ago
- Apache Spark Course Material☆89Updated 2 years ago
- Self-contained examples of Apache Spark streaming integrated with Apache Kafka.☆199Updated 7 years ago
- This repo is mostly created for pyspark and hive related interview questions.☆47Updated 3 years ago
- Apache Spark™ and Scala Workshops☆264Updated 9 months ago
- ☆20Updated 5 years ago
- Big Data Engineering practice project, including ETL with Airflow and Spark using AWS S3 and EMR☆83Updated 5 years ago
- Databricks - Apache Spark™ - 2X Certified Developer☆267Updated 4 years ago
- ☆150Updated 7 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆57Updated last year
- For Udemy students: the official repository of Rock the JVM's Spark Streaming course☆26Updated 2 years ago
- Ravi Azure ADB ADF Repository☆66Updated 3 months ago
- PySpark functions and utilities with examples. Assists ETL process of data modeling☆102Updated 4 years ago
- Getting started with Spark, Spark streaming, Spark SQL and DataFrame.☆48Updated 6 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- Real-world Spark pipelines examples☆83Updated 7 years ago