Code repository for the "PySpark in Action" book
☆214Jun 11, 2025Updated 8 months ago
Alternatives and similar repositories for DataAnalysisWithPythonAndPySpark
Users that are interested in DataAnalysisWithPythonAndPySpark are comparing it to the libraries listed below
Sorting:
- Data for the `Data Analysis with Python and PySpark` book☆41Jan 9, 2023Updated 3 years ago
- ☆24Dec 21, 2020Updated 5 years ago
- Data Wrangling with Python 3.x, published by Packt☆19Jan 30, 2023Updated 3 years ago
- Data Labeling in Machine Learning with Python, by Packt Publishing☆23Feb 5, 2026Updated 3 weeks ago
- ☆13Feb 27, 2024Updated 2 years ago
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆229Jun 26, 2023Updated 2 years ago
- ☆11Oct 6, 2023Updated 2 years ago
- This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…☆15Nov 15, 2021Updated 4 years ago
- Notes on how to set up your backend instance☆12May 29, 2024Updated last year
- Code for the second edition of Data Pipelines with Apache Airflow Book☆39Feb 11, 2026Updated 2 weeks ago
- Code for Data Pipelines with Apache Airflow☆813Aug 15, 2024Updated last year
- Utility code for use with PyXLL☆10Nov 3, 2020Updated 5 years ago
- Official repository of the Manning book - Fight Fraud with Machine Learning - by Ashish Ranjan Jha☆19May 24, 2025Updated 9 months ago
- Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average…☆15Apr 5, 2022Updated 3 years ago
- demo material for my PowerShell Scripting Secrets presentation☆15Nov 20, 2017Updated 8 years ago
- Machine Learning Engineering with MLflow, published by Packt☆123Feb 5, 2026Updated 3 weeks ago
- Files for my PyTorch book☆43Dec 3, 2025Updated 2 months ago
- Source Code for 'Azure Data Factory' by Example by Richard Swinbank☆17Jun 21, 2021Updated 4 years ago
- PyRapidML is an open source Python library which not only helps in automating Machine Learning Workflows but also helps in building end t…☆14Aug 7, 2021Updated 4 years ago
- Source Code for 'Practical Haskell, 3rd Edition' by Alejandro Serrano Mena☆13Oct 11, 2022Updated 3 years ago
- Contains different projects about data science.☆14Nov 16, 2024Updated last year
- A Docker Compose Consul network definition☆11Mar 2, 2018Updated 7 years ago
- KNIME Deep Learning Integration☆24Updated this week
- Python☆13Oct 27, 2023Updated 2 years ago
- Reference code base for ML Engineering, Manning Publications☆137Jul 16, 2021Updated 4 years ago
- ☆17Jan 24, 2023Updated 3 years ago
- Building a real-time alert monitoring pipeline that sends email notifications off of Azure Event Hubs, Azure Databricks, and a Azure Logi…☆13Mar 8, 2020Updated 5 years ago
- Table detection with Florence.☆15Jul 11, 2024Updated last year
- Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University☆166Dec 4, 2025Updated 2 months ago
- Example code for the O'Reilly book Hypermodern Python Tooling☆20Mar 29, 2025Updated 10 months ago
- ☆13Nov 20, 2020Updated 5 years ago
- This is an ETL application on AWS with general open sales and customer data that you can find here: https://github.com/camposvinicius/dat…☆18Feb 7, 2022Updated 4 years ago
- Writing PySpark logs in Apache Spark and Databricks☆17Jun 13, 2022Updated 3 years ago
- Unlocking the Secrets of Prompt Engineering, Publlished by Packt☆23Dec 14, 2023Updated 2 years ago
- A Flink applcation that demonstrates reading and writing to/from Apache Kafka with Apache Flink☆20Jul 23, 2023Updated 2 years ago
- Create and manage Amazon SageMaker HyperPod clusters, run distributed model training☆24Jan 29, 2026Updated 3 weeks ago
- Spark Databricks Notebooks☆14Dec 19, 2020Updated 5 years ago
- Essential PySpark for Scalable Data Analytics, published by Packt☆46Jan 30, 2023Updated 3 years ago
- Apache Airflow Best Practices, published by Packt☆51Nov 4, 2024Updated last year