best-practice-and-impact / ons-sparkLinks
☆16Updated last month
Alternatives and similar repositories for ons-spark
Users that are interested in ons-spark are comparing it to the libraries listed below
Sorting:
- A suite of PySpark, Pandas, and general pipeline utils for ONS projects.☆18Updated last month
- Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake☆243Updated 5 months ago
- Generate synthetic Spotify music stream dataset to create dashboards. Spotify API generates fake event data emitted to Kafka. Spark consu…☆69Updated last year
- PySpark test helper methods with beautiful error messages☆730Updated 2 months ago
- This repository goes over how to handle massive variety in data engineering☆307Updated 2 years ago
- An open-source logical data modeler to support the model driven data engineering approach.☆12Updated this week
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆90Updated last year
- Demo project for dbt on Databricks☆32Updated 5 years ago
- Custom PySpark Data Sources☆81Updated 3 weeks ago
- ☆13Updated last year
- The resources of the preparation course for Databricks Data Engineer Associate certification exam☆527Updated 2 months ago
- ☆832Updated 7 months ago
- This project is for demonstrating knowledge of Data Engineering tools and concepts and also learning in the process☆47Updated 2 years ago
- Enforce Data Contracts☆741Updated this week
- Code for "Efficient Data Processing in Spark" Course☆347Updated last month
- ☆40Updated 2 years ago
- ☆378Updated 10 months ago
- Edit Open Data Contract Standard in Excel☆29Updated 3 months ago
- Data Engineering Project: Extracting music video metrics of Twice using YouTube API, AWS, and Tableau☆30Updated 2 years ago
- A curated list of awesome dbt resources☆1,591Updated last month
- Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp Udemy course☆717Updated last week
- This is project documentation templates derived from CRISP-DM to be used for Data Engineering projects.☆58Updated 4 years ago
- Apartments Data Pipeline using Airflow and Spark.☆23Updated 3 years ago
- ☆162Updated 3 years ago
- ☆141Updated 9 months ago
- Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra☆143Updated 2 years ago
- PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster☆481Updated last year
- Code to demonstrate data engineering metadata & logging best practices☆17Updated last year
- Beginner data engineering project - batch edition☆550Updated 10 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆17Updated last year