rss161030 / ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.
☆11Updated 7 years ago
Alternatives and similar repositories for ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala:
Users that are interested in ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala are comparing it to the libraries listed below
- Spark Examples☆125Updated 3 years ago
- Guide for databricks spark certification☆58Updated 3 years ago
- Apache Spark Course Material☆88Updated last year
- ☆11Updated 5 years ago
- Apache Spark 3 - Structured Streaming Course Material☆45Updated 4 years ago
- ETL pipeline using pyspark (Spark - Python)☆113Updated 4 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- PySpark-ETL☆23Updated 5 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Updated 6 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆57Updated last year
- Spark Structured Streaming / Kafka / Cassandra / Elastic☆183Updated 2 years ago
- Repository used for Spark Trainings☆53Updated last year
- Simple repo to demonstrate how to submit a spark job to EMR from Airflow☆33Updated 4 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Self-contained examples of Apache Spark streaming integrated with Apache Kafka.☆199Updated 6 years ago
- Counting Tweets Per User in Real-Time☆42Updated 7 years ago
- For Udemy students: the official repository of Rock the JVM's Spark Streaming course☆26Updated 2 years ago
- Apache Spark and Apache Kafka integration example☆124Updated 7 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Apache Spark™ and Scala Workshops☆264Updated 8 months ago
- Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficient…☆55Updated 2 years ago
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 4 years ago
- ☆19Updated 5 years ago
- Twitter Sentiment Analysis using Spark and Kafka☆115Updated 5 years ago
- Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collab…☆37Updated 4 years ago
- Databricks - Apache Spark™ - 2X Certified Developer☆266Updated 4 years ago
- Oozie Samples☆52Updated 11 years ago