Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple DataFrames, visualization, Machine Learning
☆30Aug 26, 2020Updated 5 years ago
Alternatives and similar repositories for pySpark_tutorial
Users that are interested in pySpark_tutorial are comparing it to the libraries listed below
Sorting:
- ☆18Nov 9, 2025Updated 3 months ago
- Movie Reviews Sentiment Analysis☆12Jun 28, 2018Updated 7 years ago
- Produce Kafka messages, consume them and upload into Cassandra, MongoDB.☆43Sep 26, 2023Updated 2 years ago
- Random Forest Regression☆24Jun 1, 2018Updated 7 years ago
- EDA☆23Dec 16, 2018Updated 7 years ago
- Learn React.js by building a re-usable Survey application. We'll cover React v16.8 with a heavy focus on the use of React Hooks.☆20Mar 27, 2019Updated 6 years ago
- Deep Learning Specialization course by IIT Roorkee (Using python, numpy, pandas, sklearn,TensorFlow 2)☆26Apr 12, 2024Updated last year
- ☆26Sep 4, 2018Updated 7 years ago
- This repository contains deeplearning4j examples for importing and making use of models trained in keras☆27May 7, 2017Updated 8 years ago
- ☆10Jun 21, 2021Updated 4 years ago
- Pyspark☆29Aug 14, 2021Updated 4 years ago
- Data sets and ML models versioning example from DVC get started☆10Jun 4, 2024Updated last year
- Learn various Algorithms of Machine Learning like SVC, Decision Tree , Random Forest , Logistic Regression, Linear Regression and much Mo…☆11Jul 31, 2019Updated 6 years ago
- Framework for studying cryptographic hash functions using SAT.☆10Dec 21, 2021Updated 4 years ago
- Simple python script that converts all Excel files (xls, xlsx, xlsm, csv) in a directory into xlsb files.☆10Mar 13, 2023Updated 2 years ago
- Automated Continuous Data Quality Measurement☆12Nov 15, 2023Updated 2 years ago
- Natural Language Processing☆11Jun 23, 2021Updated 4 years ago
- Python library for the simulation of probabilistic circuits.☆11Feb 1, 2026Updated last month
- Repository to storage the 4mula dataset☆10Sep 1, 2021Updated 4 years ago
- Python3, NetworkX, Java, MLlib, Spark, Cassandra, Neo4j 3.0, Gephi, Docker☆11Jul 18, 2017Updated 8 years ago
- Given the Live on board data of various drivers, a score corresponding to each driver is to be formulated, which will help insurance comp…☆12Sep 13, 2018Updated 7 years ago
- ☆38Feb 23, 2026Updated last week
- https://liyasthomas.com☆16Jan 21, 2022Updated 4 years ago
- Anaconda plugin for StarCluster☆21Aug 14, 2024Updated last year
- Learn how to combine Nginx + wigs + load balancing + flask + unit testing + Docker☆12Jun 2, 2021Updated 4 years ago
- Generative Adversarial Networks☆10Feb 2, 2023Updated 3 years ago
- Introduction to Generative Adversarial Network☆11Dec 19, 2019Updated 6 years ago
- CSC 424 Advanced Database Management Systems☆16Jan 1, 2020Updated 6 years ago
- Exploratory Data Analysis and Data Visualisation of All Space Missions from 1957 Dataset.☆12Jun 15, 2021Updated 4 years ago
- Bluez-Dubbing: A Modular End-to-End Multilingual AI System for Automatic Video Translation☆25Feb 22, 2026Updated last week
- A scraper made using beautiful soup 4 in python. Tailor made for extracting news from moneycontrol.com. Issue pull request for different …☆12Jun 21, 2020Updated 5 years ago
- ☆10Aug 12, 2024Updated last year
- The dataset contains Wikipedia comments which have been labeled by human raters for toxic behavior.