whole-tale / all-spark-notebook
Jupyter Notebook with Spark support extracted from jupyter/docker-stack
☆18Updated 6 years ago
Alternatives and similar repositories for all-spark-notebook:
Users that are interested in all-spark-notebook are comparing it to the libraries listed below
- Sentiment Analysis of a Twitter Topic with Spark Structured Streaming☆55Updated 6 years ago
- ☆16Updated last year
- Delta-Lake, ETL, Spark, Airflow☆46Updated 2 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- ☆49Updated 3 years ago
- Code for my presentation: Using PySpark to Process Boat Loads of Data☆20Updated 7 years ago
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot more☆19Updated 2 years ago
- Just a boilerplate for PySpark and Flask☆35Updated 6 years ago
- Docker compose and Google Colab demo to build a CDC with Delta Lake☆15Updated 2 years ago
- code, labs and lectures for the course☆46Updated last year
- Repository used for Spark Trainings☆53Updated last year
- A repository for a PySpark Cookbook by Tomasz Drabas and Denny Lee☆60Updated 6 years ago
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'☆119Updated last year
- The repository for the course in Udemy☆16Updated 5 years ago
- This repo is an approach to TDD in machine learning model operation. it covers project structure, testing essentials using pytest with Gi…☆15Updated 4 years ago
- Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time☆70Updated 8 years ago
- ☆23Updated 2 years ago
- Apche Spark Structured Streaming with Kafka using Python(PySpark)☆41Updated 5 years ago
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 2 years ago
- The source code for the book Modern Data Engineering with Apache Spark☆35Updated 2 years ago
- Work for Mastering Large Datasets with Python☆18Updated 2 years ago
- ETL pipeline using pyspark (Spark - Python)☆113Updated 4 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Quick Guides from Dremio on Several topics☆69Updated 2 months ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆36Updated 8 months ago
- Analyzing Clickstream Data using Markov Chains and data mining SPACE algorithm☆29Updated 6 years ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆47Updated last year
- Blog post on ETL pipelines with Airflow☆23Updated 4 years ago
- Design/Implement stream/batch architecture on NYC taxi data | #DE☆25Updated 3 years ago
- Code to build a simple analytics data pipeline with Python☆102Updated 8 years ago