Pyspark boilerplate for running prod ready data pipeline
☆29Mar 17, 2021Updated 5 years ago
Alternatives and similar repositories for pyspark-boilerplate-mehdio
Users that are interested in pyspark-boilerplate-mehdio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A containerized approach using Apache Kafka, Spark, Cassandra, Hive, Jupyter, and Docker-compose.☆14Apr 14, 2021Updated 5 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated 3 months ago
- Visits sessionization pipeline used for the talk☆13May 28, 2024Updated 2 years ago
- Object Detection Video with TensorFlow☆13Nov 17, 2018Updated 7 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [student project] UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆12Apr 21, 2020Updated 6 years ago
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- A content-based recommender system for books using the Project Gutenberg text corpus☆29Feb 20, 2017Updated 9 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- THIS PROJECT IS ABOUT TURKISH SENTIMENT ANALYSIS☆14Aug 23, 2019Updated 6 years ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆28Mar 17, 2026Updated 2 months ago
- Examples and Quick Starts for Snowflake☆11Updated this week
- Distributed stock price forecasting system to predict S&P 500 stock prices.☆11Nov 12, 2021Updated 4 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- An implementation of apriori algorithm under spark platform☆11Dec 13, 2018Updated 7 years ago
- Jupyter Notebook with Spark support extracted from jupyter/docker-stack☆19Jul 4, 2018Updated 7 years ago
- A FastAPI boilerplate application☆11Sep 5, 2020Updated 5 years ago
- Sample RESTful API for NodeSchool Workshop☆15Sep 13, 2016Updated 9 years ago
- Scala Real Time Bidding System using open-rtb protocol (openrtb) [IAB open RTB 2.3 specs] - Simulation☆13Jun 27, 2020Updated 5 years ago
- ☆31Oct 29, 2018Updated 7 years ago
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- the new danlevy.net☆15Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Collection of notebooks☆17Oct 27, 2024Updated last year
- Writing PySpark logs in Apache Spark and Databricks☆17Jun 13, 2022Updated 3 years ago
- A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.☆10Jul 14, 2020Updated 5 years ago
- Spark Projects for the Berkeley Data Science Course☆13Aug 12, 2015Updated 10 years ago
- End to end data pipeline☆22Apr 13, 2025Updated last year
- ☆16Apr 26, 2024Updated 2 years ago
- Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark☆11May 22, 2018Updated 8 years ago
- Run a Spark job within Amazon EMR☆12Sep 12, 2020Updated 5 years ago
- Analyzing Big Data with Amazon EMR☆12Sep 14, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…☆15Nov 15, 2021Updated 4 years ago
- ☆16Nov 17, 2017Updated 8 years ago
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- A simple TUI for stow☆16Apr 13, 2021Updated 5 years ago
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Apr 4, 2026Updated 2 months ago
- Analysis of City Of Chicago Taxi Trip Dataset Using AWS EMR, Spark, PySpark, Zeppelin and Airbnb's Superset☆15Jul 16, 2017Updated 8 years ago
- This repo provides the Kubernetes Helm chart for deploying Pyspark Notebook.☆17Nov 16, 2022Updated 3 years ago