emmc15 / pyspark-testing-envLinks
Example Repo to have full end to end pyspark testing via docker-compose
☆31Updated 3 years ago
Alternatives and similar repositories for pyspark-testing-env
Users that are interested in pyspark-testing-env are comparing it to the libraries listed below
Sorting:
- PySpark test helper methods with beautiful error messages☆752Updated 3 weeks ago
- Delta Lake helper methods in PySpark☆327Updated 3 weeks ago
- A Python Library to support running data quality rules while the spark job is running⚡☆197Updated this week
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆226Updated last week
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆279Updated 4 months ago
- Quickstart for any service☆167Updated this week
- The easiest way to run Airflow locally, with linting & tests for valid DAGs and Plugins.☆258Updated 4 years ago
- A portable Datamart and Business Intelligence suite built with Docker, Dagster, dbt, DuckDB and Superset☆258Updated last month
- Make dbt great again! Extend dbt with plugins, local docs and custom adapters — fast, safe, and developer-friendly☆280Updated last week
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆211Updated last month
- A lightweight Python-based tool for extracting and analyzing data column lineage for dbt projects☆194Updated 10 months ago
- A Python package that creates fine-grained dbt tasks on Apache Airflow☆82Updated 2 weeks ago
- Dagster Labs' open-source data platform, built with Dagster.☆437Updated this week
- Code for "Efficient Data Processing in Spark" Course☆361Updated 3 months ago
- ☆42Updated 4 years ago
- Template for a data contract used in a data mesh.☆486Updated last year
- Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.☆347Updated 4 months ago
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆225Updated 9 months ago
- Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principle…☆124Updated 10 months ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆250Updated last year
- This dbt package contains macros to support unit testing that can be (re)used across dbt projects.☆448Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Updated 11 months ago
- Apache Airflow integration for dbt☆411Updated last year
- Linter for dbt metadata☆207Updated last month
- Turning PySpark Into a Universal DataFrame API☆485Updated last week
- Python API for Deequ☆810Updated 3 weeks ago
- This package contains macros and models to find DAG issues automatically☆528Updated last week
- Useful macros when performing data audits☆393Updated 3 weeks ago
- A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts☆388Updated last week
- This dbt package captures metadata, artifacts, and test results so you can detect anomalies, monitor data quality, and build metadata tab…☆484Updated last week