Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
☆1,662Mar 16, 2024Updated 2 years ago
Alternatives and similar repositories for spark-py-notebooks
Users that are interested in spark-py-notebooks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset☆828Oct 6, 2021Updated 4 years ago
- Ways of doing Data Science Engineering and Machine Learning in R and Python☆618Apr 25, 2021Updated 4 years ago
- R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks☆122Sep 6, 2017Updated 8 years ago
- PySpark-Tutorial provides basic algorithms using PySpark☆1,275May 26, 2025Updated 10 months ago
- Apache Spark (PySpark) Practice on Real Data☆271Jan 31, 2020Updated 6 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- PySpark + Scikit-learn = Sparkit-learn☆1,149Dec 31, 2020Updated 5 years ago
- Code repository for Learning PySpark by Packt☆343Jan 30, 2023Updated 3 years ago
- Code snippets and tutorials for working with social science data in PySpark☆418Aug 11, 2017Updated 8 years ago
- Fundamentals of Spark with Python (using PySpark), code examples☆364Oct 29, 2022Updated 3 years ago
- Code base for the Learning PySpark book (in preparation)☆630Apr 16, 2019Updated 6 years ago
- A curated list of awesome Apache Spark packages and resources.☆1,868Feb 27, 2026Updated last month
- Jupyter notebooks for pyspark tutorials given at University☆110Jan 7, 2026Updated 2 months ago
- A free tutorial for Apache Spark.☆992Jan 5, 2026Updated 2 months ago
- Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce,…☆28,946Mar 20, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Interactive and Reactive Data Science using Scala and Spark.☆3,147May 16, 2023Updated 2 years ago
- Getting start with PySpark and MLlib☆299May 7, 2018Updated 7 years ago
- Updated repository☆157Nov 25, 2021Updated 4 years ago
- Learn the pyspark API through pictures and simple examples☆170Jan 23, 2021Updated 5 years ago
- pyspark sample scripts☆16Jan 9, 2019Updated 7 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 6 months ago
- Implementing best practices for PySpark ETL jobs and applications.☆2,086Jan 1, 2023Updated 3 years ago
- A wine recommender system tutorial using Python technologies such as Django, Pandas, or Scikit-learn, and others such as Bootstrap.☆347Mar 17, 2018Updated 8 years ago
- Very basic introduction to pyspark☆15Mar 20, 2017Updated 9 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,858Jul 10, 2023Updated 2 years ago
- OnLine Spectral Search ENgine for Proteomics big data using Apache Spark, Python/Flask, and AngularJS☆15Sep 14, 2015Updated 10 years ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,538Dec 2, 2024Updated last year
- Jupyter notebooks from the scikit-learn video series☆3,782Mar 5, 2024Updated 2 years ago
- The "Python Machine Learning (1st edition)" book code repository and info resource☆12,603Nov 20, 2024Updated last year
- Notes on Apache Spark (pyspark)☆299Mar 3, 2019Updated 7 years ago
- LearningApacheSpark☆250Jan 3, 2024Updated 2 years ago
- Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin☆52May 13, 2016Updated 9 years ago
- Repository of teaching materials, code, and data for my data analysis and machine learning projects.☆6,662Jun 21, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Apache Spark - A unified analytics engine for large-scale data processing☆43,041Updated this week
- This repository contains Spark, MLlib, PySpark and Dataframes projects☆49Oct 22, 2017Updated 8 years ago
- Code to accompany Advanced Analytics with Spark from O'Reilly Media☆1,526Sep 25, 2024Updated last year
- Information for setting up for the BerkeleyX Spark Intro MOOC, and lab assignments for the course☆346Mar 19, 2021Updated 5 years ago
- Ready-to-run Docker images containing Jupyter applications☆8,423Mar 22, 2026Updated last week
- TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)☆43,789Jul 26, 2024Updated last year
- Distributed Deep learning with Keras & Spark☆1,578May 1, 2023Updated 2 years ago