Pyspark boilerplate for running prod ready data pipeline
☆29Mar 17, 2021Updated 5 years ago
Alternatives and similar repositories for pyspark-boilerplate-mehdio
Users that are interested in pyspark-boilerplate-mehdio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Boilerplate for PySpark on Cloud Kubernetes☆33Oct 12, 2021Updated 4 years ago
- A containerized approach using Apache Kafka, Spark, Cassandra, Hive, Jupyter, and Docker-compose.☆14Apr 14, 2021Updated 5 years ago
- [student project] UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆12Apr 21, 2020Updated 6 years ago
- Local Development of AWS Glue with Docker and Visual Studio Code☆14Nov 29, 2021Updated 4 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Jun 18, 2022Updated 3 years ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated 2 months ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- Test API using Fast API library.☆14Apr 10, 2022Updated 4 years ago
- Examples and Quick Starts for Snowflake☆11Apr 4, 2026Updated last month
- repo with resources from Understanding Data with Alex Merced videos☆14Jan 20, 2024Updated 2 years ago
- ☆15Aug 22, 2022Updated 3 years ago
- Finetuning and Inference of Llama2 7b model on colab☆14Jul 19, 2023Updated 2 years ago
- Distributed stock price forecasting system to predict S&P 500 stock prices.☆11Nov 12, 2021Updated 4 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- Power Plant ML Pipeline Application - Apache Spark☆12Dec 12, 2016Updated 9 years ago
- An implementation of apriori algorithm under spark platform☆11Dec 13, 2018Updated 7 years ago
- A FastAPI boilerplate application☆11Sep 5, 2020Updated 5 years ago
- Sample RESTful API for NodeSchool Workshop☆15Sep 13, 2016Updated 9 years ago
- ☆31Oct 29, 2018Updated 7 years ago
- Learn Spanish conjugation the easy way☆19Aug 16, 2016Updated 9 years ago
- PredictorFinc is a scalable supervised machine learning model the predicts stock price change through Decision Tree Regressor using data …☆12Sep 5, 2023Updated 2 years ago
- Collection of notebooks☆17Oct 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Spark Projects for the Berkeley Data Science Course☆13Aug 12, 2015Updated 10 years ago
- ☆13Feb 19, 2025Updated last year
- Coder in your OpenShift/Kubernetes Cluster or in Docker☆10Oct 22, 2021Updated 4 years ago
- ☆16Apr 26, 2024Updated 2 years ago
- Run a Spark job within Amazon EMR☆12Sep 12, 2020Updated 5 years ago
- Generate proxy versions of cards/decks you are interested in purchasing!☆16May 18, 2025Updated last year
- This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…☆15Nov 15, 2021Updated 4 years ago
- ☆16Nov 17, 2017Updated 8 years ago
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Helm Chart for deploying Spark history server in Amazon EKS for S3 Spark Event Logs☆29Apr 4, 2026Updated last month
- Methods for mapping proteomics data on 3D protein structure.☆15Jan 18, 2020Updated 6 years ago
- Medieval strategy autobattling deckbuilder. My First Game Jam: Summer 2020.☆17Apr 28, 2021Updated 5 years ago
- A simple, working, 32-bit ALU design.☆14Dec 26, 2014Updated 11 years ago
- A tutorial on building a real-time data streaming application pipeline with Apache Kafka🔥🔥🔥☆24Apr 29, 2022Updated 4 years ago
- Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average…☆15Apr 5, 2022Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 3 years ago