rss161030 / ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perform analytics using Spark and Scala and loading the data back to HDFS.
☆11Updated 7 years ago
Alternatives and similar repositories for ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala:
Users that are interested in ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala are comparing it to the libraries listed below
- Repository used for Spark Trainings☆53Updated last year
- Counting Tweets Per User in Real-Time☆41Updated 7 years ago
- Twitter Sentiment Analysis using Spark and Kafka☆114Updated 5 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆86Updated 5 years ago
- ETL pipeline using pyspark (Spark - Python)☆112Updated 4 years ago
- Spark Examples☆125Updated 2 years ago
- Apache Spark™ and Scala Workshops☆263Updated 5 months ago
- ☆147Updated 2 years ago
- Examples To Help You Learn Apache Spark☆77Updated 6 years ago
- Simple examle for Spark Streaming over Kafka topic☆106Updated 4 years ago
- This repository focuses on gathering and making a curated list resources to learn Hadoop for FREE.☆51Updated 6 years ago
- Guide for databricks spark certification☆58Updated 3 years ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Updated 5 years ago
- ☆37Updated 8 years ago
- PySpark-ETL☆23Updated 5 years ago
- Spark Structured Streaming / Kafka / Cassandra / Elastic☆183Updated last year
- Apache Spark Course Material☆86Updated last year
- ☆148Updated 6 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- Demonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computin…☆24Updated last year
- Self-contained examples of Apache Spark streaming integrated with Apache Kafka.☆199Updated 6 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 3 years ago
- Apache Spark 3 - Structured Streaming Course Material☆44Updated 4 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- ( These solutions tested on 4 node Hortonwork cluster on my laptop. Do not test on your production environment until you test... :)☆21Updated 4 years ago
- Real-world Spark pipelines examples☆84Updated 6 years ago
- ☆11Updated 5 years ago
- Project for James' Apache Spark with Scala course☆127Updated 4 years ago