hiejulia / Data-pipeline-project
Data pipeline project
☆32Updated 2 months ago
Alternatives and similar repositories for Data-pipeline-project:
Users that are interested in Data-pipeline-project are comparing it to the libraries listed below
- All my projects on Big Data are provided☆27Updated 8 years ago
- The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big d…☆32Updated 5 years ago
- plan, design and implement enterprise data infrastructure solutions and create the blueprints for an organization’s data management syste…☆11Updated last year
- Data cleaning, pre-processing, and Analytics on a Health care data using Spark and Python.☆48Updated last year
- Counting Tweets Per User in Real-Time☆42Updated 7 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- Personal project where I perform some analytics (including Sentiment Analysis) over a Twitter Stream using Big Data Technologies of the H…☆21Updated 2 years ago
- Big data projects implemented by Maniram yadav☆51Updated 6 years ago
- Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.☆95Updated 3 years ago
- This project is mainly for learning and practicing simple HIVE commands in real time scenarios. Here we have taken some sample coffee sho…☆11Updated 7 years ago
- data engineering 100 days 🤖 🧲 🦾 | #DE☆40Updated last year
- Deployed an kafka instance in AWS EC2 Instance to streamline the data into Databricks☆10Updated last year
- Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time☆70Updated 8 years ago
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- Classwork projects and home works done through Udacity data engineering nano degree☆74Updated last year
- Twitter Sentiment Analysis using Spark and Kafka☆115Updated 5 years ago
- Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collab…☆37Updated 5 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆40Updated 5 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆44Updated last year
- A end-to-end real-time stock market data pipeline with Python, AWS EC2, Apache Kafka, and Cassandra Data is processed on AWS EC2 with Apa…☆25Updated last year
- A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.☆177Updated 2 years ago
- Apache Spark Interview Question and Answers☆20Updated 4 years ago
- Project for real-time anomaly detection using Kafka and python☆58Updated 2 years ago
- Code examples on Apache Spark using python☆107Updated 2 years ago
- ☆32Updated last year
- Data Quest - Data Engineer Learning and Projects☆24Updated 5 years ago
- This repository implements a real-time credit card fraud detection pipeline using Kafka, Spark and Cassandra. Kafka continuously produces…☆19Updated 4 years ago
- PySpark-ETL☆23Updated 5 years ago
- A real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and D…☆30Updated 4 years ago
- Build an scikit-learn model to predict churn using customer telco data.☆16Updated 4 months ago