runawayhorse001 / learning-apache-sparkLinks
☆17Updated 7 years ago
Alternatives and similar repositories for learning-apache-spark
Users that are interested in learning-apache-spark are comparing it to the libraries listed below
Sorting:
- Source code for the MC technical blog post "Data Observability in Practice Using SQL"☆38Updated 10 months ago
- A tutorial for the Great Expectations library.☆71Updated 4 years ago
- (project & tutorial) dag pipeline tests + ci/cd setup☆88Updated 4 years ago
- ☆86Updated 2 years ago
- ☆21Updated 4 years ago
- ☆23Updated 2 years ago
- Public source code for the Udemy online course Apache Airflow: Complete Hands-On Beginner to Advanced Class.☆63Updated 4 years ago
- Code for my "Efficient Data Processing in SQL" book.☆56Updated 10 months ago
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- Essential PySpark for Scalable Data Analytics, published by Packt☆45Updated 2 years ago
- Demo of Streamlit application with Databricks SQL Endpoint☆35Updated 2 years ago
- Just starting your DE journey or along the way already?. I will be sharing a short list of DATA-ENGINEERING-CENTRED books that covers the…☆34Updated 2 years ago
- Ingesting data with Pulumi, AWS lambdas and Snowflake in a scalable, fully replayable manner☆71Updated 3 years ago
- Examples surrounding Databricks.☆58Updated 11 months ago
- Source Code for 'Applied Data Science Using PySpark' by Ramcharan Kakarla, Sundar Krishnan, and Sridhar Alla☆46Updated 4 years ago
- Data Engineering with Spark and Delta Lake☆99Updated 2 years ago
- Example repo to create end to end tests for data pipeline.☆24Updated 11 months ago
- Data lake, data warehouse on GCP☆56Updated 3 years ago
- Because its never late to start taking notes and 'public' it...☆59Updated this week
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆28Updated 2 years ago
- Scaling Machine Learning in Three Week course in a collaboration with O'Reilly following the guidance of Adi Polak's book - Scaling Machi…☆23Updated 2 years ago
- PySpark Cheatsheet☆36Updated 2 years ago
- Cost Efficient Data Pipelines with DuckDB☆53Updated 3 weeks ago
- Capturing model drift and handling its response - Example webinar☆108Updated 5 years ago
- Fully unit tested utility functions for data engineering. Python 3 only.☆17Updated 9 months ago
- PySpark-ETL☆23Updated 5 years ago
- ☆34Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Data Engineering Capstone Project: ETL Pipelines and Data Warehouse Development☆21Updated 5 years ago
- Big Data Demystified meetup and blog examples☆31Updated 9 months ago