Code examples on Apache Spark using python
☆108Aug 11, 2022Updated 3 years ago
Alternatives and similar repositories for pyspark-examples
Users that are interested in pyspark-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project is mainly for learning and practicing simple HIVE commands in real time scenarios. Here we have taken some sample coffee sho…☆11Mar 1, 2018Updated 8 years ago
- ☆18Nov 9, 2025Updated 5 months ago
- Fundamentals of Spark with Python (using PySpark), code examples☆363Oct 29, 2022Updated 3 years ago
- Spark and Python (PySpark) Examples☆39Jul 7, 2021Updated 4 years ago
- Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8☆12Feb 20, 2018Updated 8 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2☆89Jan 3, 2020Updated 6 years ago
- Educational notes,Hands on problems w/ solutions for hadoop ecosystem☆87Jan 22, 2019Updated 7 years ago
- Create LAMP Stack using terraform with AWS☆11Feb 15, 2023Updated 3 years ago
- ☆11Dec 14, 2015Updated 10 years ago
- Ansible Playbook to create LAMP in CentOS 7 with Apache, MySQL, PHP.☆10Dec 28, 2018Updated 7 years ago
- Apache Spark (PySpark) Practice on Real Data☆271Jan 31, 2020Updated 6 years ago
- All Certification and preparation, examples & others☆11Oct 18, 2018Updated 7 years ago
- Pyspark RDD, DataFrame and Dataset Examples in Python language☆1,350Dec 7, 2025Updated 4 months ago
- ☆14Aug 24, 2021Updated 4 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆12Mar 14, 2023Updated 3 years ago
- Ansible playbooks for Apache Spark on kube☆27Jul 20, 2017Updated 8 years ago
- PySpark Code for Hands-on Learners☆117Nov 3, 2019Updated 6 years ago
- Utilities to Retrieve Rulelists from Model Fits, Filter, Prune, Reorder and Predict on unseen data☆11Feb 4, 2025Updated last year
- ☆13Oct 21, 2020Updated 5 years ago
- Unleash the power of GRASS GIS with Jupyter (FOSS4G 2022 workshop)☆15Oct 4, 2023Updated 2 years ago
- The official repository for the Rock the JVM Spark Optimization with Scala course☆57Dec 4, 2023Updated 2 years ago
- Complete Guide To Mastering Databricks☆32Feb 28, 2026Updated last month
- Python API for Informatica PowerCenter (pmrep, pmcmd)☆21Sep 17, 2017Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- My Reusable Notes☆26Jun 25, 2020Updated 5 years ago
- Dashboard to visualize the growth of coronavirus (plotly and dash)☆12May 22, 2023Updated 2 years ago
- Notes from 100 days with Kubernetes☆31Jan 25, 2019Updated 7 years ago
- All my projects on Big Data are provided☆27Dec 5, 2016Updated 9 years ago
- ☆11Jun 3, 2025Updated 10 months ago
- Kafka-Notes☆15Jun 20, 2021Updated 4 years ago
- This data project can be used as a take-home assignment to learn Pyspark and Data Engineering.☆18Feb 19, 2023Updated 3 years ago
- Kirk's Zeppelin Notebooks☆11May 22, 2018Updated 7 years ago
- Ansible crash course☆39May 3, 2019Updated 6 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- This repository contains code for Spark Streaming☆26Mar 11, 2021Updated 5 years ago
- [NOT MAINTAINED] Create an ElasticSearch cluster with a simple single bash command. Config through environment variables: RAM, cluster na…☆59Jan 26, 2018Updated 8 years ago
- Pexpect is a pure Python module for spawning child applications; controlling them; and responding to expected patterns in their output.☆38Oct 26, 2012Updated 13 years ago
- ☆13Oct 28, 2025Updated 5 months ago
- How to manage Slowly Changing Dimensions with Apache Hive☆55Aug 27, 2019Updated 6 years ago
- Jupyter notebooks for pyspark tutorials given at University☆110Jan 7, 2026Updated 3 months ago
- Data Science In Investment Banking☆22Sep 20, 2025Updated 6 months ago