whole-tale / all-spark-notebookLinks
Jupyter Notebook with Spark support extracted from jupyter/docker-stack
☆19Updated 7 years ago
Alternatives and similar repositories for all-spark-notebook
Users that are interested in all-spark-notebook are comparing it to the libraries listed below
Sorting:
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
- Repo that relates to the Medium blog 'Keeping your ML model in shape with Kafka, Airflow' and MLFlow'☆121Updated 2 years ago
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆38Updated last year
- A proof of concept for how to set up a codebase for an analytics org.☆14Updated 3 years ago
- ☆19Updated 4 years ago
- ☆48Updated 3 years ago
- Data Science Quick Tips Repository!☆48Updated last year
- Best practices for engineering ML pipelines.☆35Updated 3 years ago
- How to use Python to understand data and transform the data into a tidy format ready to be used for modelling and visualisation.☆37Updated 6 years ago
- PyConDE & PyData Berlin 2019 Airflow Workshop: Airflow for machine learning pipelines.☆47Updated last year
- Blog post on ETL pipelines with Airflow☆23Updated 5 years ago
- A data pipeline moving data from a Relational database system (RDBMS) to a Hadoop file system (HDFS).☆15Updated 4 years ago
- A series of Jupyter notebooks that walk you through Machine Learning with Apache Spark ecosystem using Spark MLlib, PyTorch and TensorFlo…☆83Updated last year
- The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.☆52Updated 2 years ago
- Jupyter notebooks for pyspark tutorials given at University☆108Updated last week
- Creating a Gradio user interface to predict the sentiment of a tweet☆12Updated 3 years ago
- Cost Efficient Data Pipelines with DuckDB☆55Updated 2 months ago
- Batch Processing , orchestration using Apache Airflow and Google Workflows, spark structured Streaming and a lot more☆18Updated 3 years ago
- Small data engineering tutorial☆10Updated 6 years ago
- Just a boilerplate for PySpark and Flask☆35Updated 6 years ago
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆22Updated 2 years ago
- Notebooks for the ML Link Prediction Course☆14Updated 4 years ago
- Check the basic quality of any dataset☆11Updated 4 years ago
- Singapore Condo Rental Prices - From Data Acquisition to Prediction☆13Updated 4 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- 🐍💨 Airflow tutorial for PyCon 2019☆86Updated 2 years ago
- Study notes and demos.☆12Updated last year
- A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for …☆137Updated 5 years ago
- ☆13Updated 4 years ago