Pyspark boilerplate for running prod ready data pipeline
☆29Mar 17, 2021Updated 5 years ago
Alternatives and similar repositories for pyspark-boilerplate-mehdio
Users that are interested in pyspark-boilerplate-mehdio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Boilerplate for PySpark on Cloud Kubernetes☆33Oct 12, 2021Updated 4 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Sep 7, 2022Updated 3 years ago
- A Gentle introduction to Machine Learning with Apache Spark☆11Mar 2, 2026Updated last month
- Visits sessionization pipeline used for the talk☆13May 28, 2024Updated last year
- A content-based recommender system for books using the Project Gutenberg text corpus☆29Feb 20, 2017Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Jun 18, 2022Updated 3 years ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆27Mar 17, 2026Updated 3 weeks ago
- Rasa Chatbot using Django backend and Sockets for communication☆12Dec 8, 2022Updated 3 years ago
- Test API using Fast API library.☆14Apr 10, 2022Updated 3 years ago
- Examples and Quick Starts for Snowflake☆11Updated this week
- repo with resources from Understanding Data with Alex Merced videos☆14Jan 20, 2024Updated 2 years ago
- Finetuning and Inference of Llama2 7b model on colab☆14Jul 19, 2023Updated 2 years ago
- ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…☆11Mar 9, 2022Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A repository of strategies that can be used to automate intra day trades in the National Stock Exchange using the KiteConnect API by Zero…☆16Apr 19, 2021Updated 4 years ago
- Jupyter Notebook with Spark support extracted from jupyter/docker-stack☆19Jul 4, 2018Updated 7 years ago
- A FastAPI boilerplate application☆12Sep 5, 2020Updated 5 years ago
- Sample RESTful API for NodeSchool Workshop☆15Sep 13, 2016Updated 9 years ago
- A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.☆10Jul 14, 2020Updated 5 years ago
- Example project that uses v-leaflet to edit JPA entities in a basic Vaadin app☆12Mar 17, 2016Updated 10 years ago
- A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.☆11Jul 4, 2021Updated 4 years ago
- Spark Projects for the Berkeley Data Science Course☆13Aug 12, 2015Updated 10 years ago
- ☆13Feb 19, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- End to end data pipeline☆22Apr 13, 2025Updated 11 months ago
- Analyzing Big Data with Amazon EMR☆12Sep 14, 2020Updated 5 years ago
- ☆16Nov 17, 2017Updated 8 years ago
- Marshmallow serializer integration with pyspark☆12Dec 29, 2023Updated 2 years ago
- noiseprint2 is a porting of noiseprint to tensorflow 2 and keras☆12Feb 20, 2021Updated 5 years ago
- Methods for mapping proteomics data on 3D protein structure.☆15Jan 18, 2020Updated 6 years ago
- Climb aboard the JWT Express and use JWTs in your Express app with ease!☆10Dec 11, 2017Updated 8 years ago
- A simple, working, 32-bit ALU design.☆14Dec 26, 2014Updated 11 years ago
- Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average…☆15Apr 5, 2022Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A tutorial on building a real-time data streaming application pipeline with Apache Kafka🔥🔥🔥☆24Apr 29, 2022Updated 3 years ago
- Repositório dedicado a Workshop de Data Lakehouse com Delta Lake☆17Dec 6, 2021Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆56May 6, 2023Updated 2 years ago
- Use aws-emr and aws-redshift to analyse dataset of adult census of USA☆13Sep 11, 2020Updated 5 years ago
- Node script to delete unused node_modules folder☆12Jul 6, 2022Updated 3 years ago
- Custom kube-scheduler for binpacking targeting Spark on EKS and other jobs workloads☆26Feb 24, 2026Updated last month
- ☆22Feb 7, 2024Updated 2 years ago