DIYBigData / spark-data-analysis-projects
A collection of data analysis projects done using PySpark via Jupyter notebooks.
☆10Updated 2 years ago
Alternatives and similar repositories for spark-data-analysis-projects:
Users that are interested in spark-data-analysis-projects are comparing it to the libraries listed below
- Pyspark in Google Colab: A simple machine learning (Linear Regression) model☆36Updated 5 years ago
- pyspark dataframe made easy☆16Updated 3 years ago
- ☆19Updated 6 years ago
- My presentation at ODSC India 2018 about Deep Learning with Apache Spark☆27Updated 6 years ago
- Money Laundering Detector is to prove the hypothesis that a solution powered by Machine Learning and Behaviour Analytics will find… -> cu…☆21Updated 6 years ago
- Python Machine Learning (ML) project that demonstrates the archetypal ML workflow within a Jupyter notebook, with automated model deploym…☆61Updated 2 years ago
- Exploratory Data Analysis with Pandas and Python 3.x, published by Packt☆44Updated 2 years ago
- Spark and Python (PySpark) Examples☆40Updated 3 years ago
- Realtime social media data analytics with Apache Spark, Python, Kafka, Pandas, etc☆51Updated 8 years ago
- Building simple ML apps with Streamlit☆24Updated 3 years ago
- Data models, build data warehouses and data lakes, automate data pipelines, and worked with massive datasets.☆13Updated 5 years ago
- Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS☆17Updated 2 years ago
- Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups☆16Updated 6 years ago
- A Scalable Data Cleaning Library for PySpark.☆26Updated 5 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆26Updated 2 years ago
- Predict the poverty of households in Costa Rica using automated feature engineering.☆23Updated 4 years ago
- PySpark, Databrick, h2o, MLlib☆18Updated 8 years ago
- Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time☆69Updated 8 years ago
- Code for my presentation: Using PySpark to Process Boat Loads of Data☆20Updated 7 years ago
- Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA mode…☆15Updated 6 years ago
- Because its never late to start taking notes and 'public' it...☆60Updated 3 months ago
- A short tutorial notebook on PySpark☆15Updated 9 years ago
- A series of workshop modules introducing Feast feature store.☆19Updated 2 years ago
- Contains code and presentation for my interactive hack session, 'Effective Feature Engineering: A Structured Approach to Building Better …☆30Updated 4 years ago
- Prediction of loan defaulter based on more than 5L records using Python, Numpy, Pandas and XGBoost☆61Updated 2 years ago
- The demo of using Kafka, Spark, Hive, Cassandra, etc by using Docker. It produces the production ready environment for any kinds of big d…☆32Updated 5 years ago
- ☆15Updated 10 years ago
- A repository for a PySpark Cookbook by Tomasz Drabas and Denny Lee☆60Updated 6 years ago
- Streamlit example showing Scikit Learn & Pyspark ML over Healthcare data ! Its simple !!☆30Updated 4 years ago
- Machine Learning Case study on customer segmentation and prediction of groups.☆31Updated 6 years ago