sodadata / soda-streaming
☆22Updated 3 years ago
Alternatives and similar repositories for soda-streaming:
Users that are interested in soda-streaming are comparing it to the libraries listed below
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 2 years ago
- An open specification for data products in Data Mesh☆55Updated 2 months ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆77Updated this week
- A Python Library to support running data quality rules while the spark job is running⚡☆168Updated last week
- ☆94Updated last year
- Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pi…☆95Updated 2 weeks ago
- Flowchart for debugging Spark applications☆104Updated 4 months ago
- Library to convert DBT manifest metadata to Airflow tasks☆47Updated 10 months ago
- Extensible streaming ingestion pipeline on top of Apache Spark☆44Updated 10 months ago
- Rokku project. This project acts as a proxy on top of any S3 storage solution providing services like authentication, authorization, shor…☆66Updated 11 months ago
- Delta lake and filesystem helper methods☆50Updated 11 months ago
- ☆47Updated 5 months ago
- Covid19 and Iowa Liquor Sales analysis at BigQuery using dbt, Airflow, Marquez, Google Cloud and other modern data stack tools☆14Updated 2 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆43Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆23Updated 5 months ago
- Delta Lake helper methods. No Spark dependency.☆22Updated 4 months ago
- Data validation library for PySpark 3.0.0☆34Updated 2 years ago
- Make simple storing test results and visualisation of these in a BI dashboard☆40Updated last month
- Smart Automation Tool for building modern Data Lakes and Data Pipelines☆114Updated this week
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Mapping of DWH database tables to business entities, attributes & metrics in Python, with automatic creation of flattened tables☆72Updated last year
- ☆79Updated last year
- The official repository for the Rock the JVM Spark Optimization 2 course☆38Updated last year
- A simple Spark-powered ETL framework that just works 🍺☆178Updated last year
- Spark functions to run popular phonetic and string matching algorithms☆60Updated 2 years ago
- Basic framework utilities to quickly start writing production ready Apache Spark applications☆35Updated last month
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆65Updated 3 years ago
- Airflow Providers containing Deferrable Operators & Sensors from Astronomer☆142Updated this week
- Spark on Kubernetes using Helm☆34Updated 4 years ago
- Pyspark boilerplate for running prod ready data pipeline☆28Updated 3 years ago