garystafford / streaming-sales-generator
Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python
☆44Updated 2 years ago
Alternatives and similar repositories for streaming-sales-generator
Users that are interested in streaming-sales-generator are comparing it to the libraries listed below
Sorting:
- Delta Lake Documentation☆49Updated 10 months ago
- Code snippets for Data Engineering Design Patterns book☆106Updated last month
- ☆81Updated 4 months ago
- ☆53Updated 9 months ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆27Updated 8 months ago
- Delta Lake examples☆224Updated 7 months ago
- Code snippets used in demos recorded for the blog.☆37Updated 2 weeks ago
- Quick Guides from Dremio on Several topics☆71Updated 3 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆66Updated 3 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆72Updated 2 weeks ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆176Updated 3 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 4 years ago
- A Python Library to support running data quality rules while the spark job is running⚡☆188Updated last week
- A Table format agnostic data sharing framework☆38Updated last year
- Apache Flink (Pyflink) and Related Projects☆38Updated last month
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats☆29Updated 2 years ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆98Updated 2 years ago
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Updated 2 years ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- A repository of sample code to accompany our blog post on Airflow and dbt.☆172Updated last year
- Data Engineering with Spark and Delta Lake☆98Updated 2 years ago
- New generation opensource data stack☆67Updated 2 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Simple stream processing pipeline☆102Updated 10 months ago
- An example dbt project using AutomateDV to create a Data Vault 2.0 Data Warehouse based on the Snowflake TPC-H dataset.☆50Updated last year
- ☆18Updated last year
- Spark runtime on AWS Lambda☆107Updated 7 months ago
- 📚 Tech blogs & talks by companies that run Apache Flink in production☆172Updated 3 months ago