vuthanhhai2302/Applied-Pyspark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vuthanhhai2302/Applied-Pyspark)

vuthanhhai2302 / Applied-Pyspark

My applied big data analytic project with pyspark.

☆10

Alternatives and similar repositories for Applied-Pyspark

Users that are interested in Applied-Pyspark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vuthanhhai2302 / Apply-machine-learning-on-data-analytics
View on GitHub
My project of applied machine learning on data analytics, using pandas, numpy and scikit-learn to analyze data
☆10Aug 24, 2022Updated 3 years ago
vuthanhhai2302 / understand-asynchronous-programming
View on GitHub
A part of my journey on being better, learning asynchronous programming
☆23Nov 21, 2023Updated 2 years ago
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
antimoz-om / Antimoz
View on GitHub
A data engineering pipeline for digital marketers.
☆11Dec 21, 2018Updated 7 years ago
DIYBigData / spark-data-analysis-projects
View on GitHub
A collection of data analysis projects done using PySpark via Jupyter notebooks.
☆10Oct 8, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Alexmhack / Django-Rasa-Sockets
View on GitHub
Rasa Chatbot using Django backend and Sockets for communication
☆12Dec 8, 2022Updated 3 years ago
itversity / retail_db_json
View on GitHub
☆14Sep 14, 2021Updated 4 years ago
LeonardoEmili / stock-price-forecasting
View on GitHub
Distributed stock price forecasting system to predict S&P 500 stock prices.
☆11Nov 12, 2021Updated 4 years ago
RishiSankineni / Machine-Learning-Pipeline-LR-Pyspark
View on GitHub
Power Plant ML Pipeline Application - Apache Spark
☆12Dec 12, 2016Updated 9 years ago
adityajain10 / pyspark-mlib-based-stock-predictor
View on GitHub
PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …
☆12Sep 5, 2023Updated 2 years ago
prakashdontaraju / google-cloud-ecommerce
View on GitHub
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…
☆11Mar 9, 2022Updated 4 years ago
minzhang-1 / PointHop-PointHop2_Spark
View on GitHub
A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.
☆10Jul 14, 2020Updated 6 years ago
mateuspicanco / project-atlas-sao-paulo
View on GitHub
A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.
☆12Jul 4, 2021Updated 5 years ago
MLWhiz / Spark_Projects
View on GitHub
Spark Projects for the Berkeley Data Science Course
☆13Aug 12, 2015Updated 10 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
AWS-Big-Data-Projects / Run-a-Spark-job-within-Amazon-EMR
View on GitHub
Run a Spark job within Amazon EMR
☆12Sep 12, 2020Updated 5 years ago
AWS-Big-Data-Projects / AWS-EMR
View on GitHub
Analyzing Big Data with Amazon EMR
☆12Sep 14, 2020Updated 5 years ago
camposvinicius / gcp-etl
View on GitHub
This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…
☆15Nov 15, 2021Updated 4 years ago
ketgo / marshmallow-pyspark
View on GitHub
Marshmallow serializer integration with pyspark
☆12Dec 29, 2023Updated 2 years ago
francescotescari / noiseprint2
View on GitHub
noiseprint2 is a porting of noiseprint to tensorflow 2 and keras
☆12Feb 20, 2021Updated 5 years ago
sbl-sdsc / mmtf-proteomics
View on GitHub
Methods for mapping proteomics data on 3D protein structure.
☆15Jan 18, 2020Updated 6 years ago
codspire / chicago-taxi-trips-analysis
View on GitHub
Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset
☆15Jul 16, 2017Updated 9 years ago
zekeriyyaa / Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data
View on GitHub
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average…
☆15Apr 5, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ibaiGorordo / Tensorflow-Mobile-Generic-Object-Localizer
View on GitHub
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.
☆16Sep 18, 2021Updated 4 years ago
AWS-Big-Data-Projects / Analysing-Census-Data-using-aws
View on GitHub
Use aws-emr and aws-redshift to analyse dataset of adult census of USA
☆13Sep 11, 2020Updated 5 years ago
mpfishe2 / az-databricks-realtime-alert-system
View on GitHub
Building a real-time alert monitoring pipeline that sends email notifications off of Azure Event Hubs, Azure Databricks, and a Azure Logi…
☆13Mar 8, 2020Updated 6 years ago
cvilla87 / PySpark-ETL-Telecom
View on GitHub
Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…
☆17Dec 3, 2018Updated 7 years ago
chaithanya21 / Sentiment-Analysis-using-Pyspark-on-Multi-Social-Media-Data
View on GitHub
In this mini-project i have chosen to do sentiment analysis of social media websites such as twitter and reddit to gain insights into the…
☆12Mar 5, 2020Updated 6 years ago
big-data-lab-team / accident-prediction-montreal
View on GitHub
☆12Dec 8, 2022Updated 3 years ago
rvilla87 / ETL-PySpark
View on GitHub
ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)
☆17Dec 18, 2018Updated 7 years ago
Ceci-Aguilera / habaneras_de_lino_api
View on GitHub
Version 1 of Habaneras de Lino is an online ecommerce. This repo contains the backed api of the website using Django and Django Rest Fram…
☆13Dec 16, 2022Updated 3 years ago
innat / Transfer-Learning-PySpark
View on GitHub
Multi-Class Classification | Transfer Learning With PySpark
☆13Nov 12, 2019Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Kuntal-G / BigData-Analytics
View on GitHub
Analytics projects using Big Data eco-systems (Hadoop, Spark, Storm)
☆17Dec 27, 2021Updated 4 years ago
Martialhimanshu / GaanaSuno
View on GitHub
GaanaSuno is an application that lets users upload, store and play all of your music from the cloud. Additionally, a user can comment and…
☆12Aug 18, 2018Updated 7 years ago
MHassaanButt / Flight-Delays-Prediction
View on GitHub
In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we i…
☆12Dec 21, 2021Updated 4 years ago
ehsanmok / sparkling-titanic
View on GitHub
Training models with Apache Spark, PySpark for Titanic Kaggle competition
☆14Sep 23, 2016Updated 9 years ago
aurelienmorgan / abnormal_vibrations_watchdog
View on GitHub
Time Series Anomaly detection. The monitored signal is made-up of machinery vibration sensor measurements.
☆18Dec 7, 2020Updated 5 years ago
VinayChaudhari1996 / pyspark-dataframe-made-easy
View on GitHub
pyspark dataframe made easy
☆16Dec 15, 2021Updated 4 years ago
heroku-examples / analytics-with-kafka-redshift-metabase
View on GitHub
An example system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-b…
☆27Jan 11, 2023Updated 3 years ago