emmc15 / pyspark-testing-env
Example Repo to have full end to end pyspark testing via docker-compose
☆32Updated 2 years ago
Alternatives and similar repositories for pyspark-testing-env:
Users that are interested in pyspark-testing-env are comparing it to the libraries listed below
- Delta Lake helper methods in PySpark☆322Updated 7 months ago
- A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.☆195Updated this week
- A curated collection of publicly available resources on dbt best practices and how data-driven organizations around the world utilize dbt☆113Updated 3 years ago
- A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.☆184Updated last year
- Slow & local data allows you to move fast and deliver business value for the 99.9% of the data challenges.☆202Updated last week
- Showcase of advanced use cases relating to CI in dbt☆77Updated last week
- Quickstart for any service☆142Updated this week
- An integration for dbt and fzf that allows interactive selection and search of dbt models.☆70Updated last year
- A Python Library to support running data quality rules while the spark job is running⚡☆181Updated this week
- Schema modelling framework for decentralised domain-driven ownership of data.☆252Updated last year
- Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principle…☆109Updated 2 weeks ago
- Adds autocompletion to the dbt CLI☆128Updated 2 months ago
- Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase☆126Updated 2 years ago
- Pythonic Programming Framework to orchestrate jobs in Databricks Workflow☆213Updated last week
- This package generates database constraints based on the tests in a dbt project☆156Updated 4 months ago
- Linter for dbt metadata☆142Updated last week
- ☆76Updated 6 months ago
- Fake Snowflake Connector for Python. Run, mock and test Snowflake DB locally.☆126Updated last week
- A dbt-core plugin to weave together multi-project dbt-core deployments☆143Updated last week
- CLI tool for dbt users to simplify creation of staging models (yml and sql) files☆261Updated last week
- A lightweight Python-based tool for extracting and analyzing data column lineage for dbt projects☆153Updated 2 weeks ago
- A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects☆219Updated this week
- Template for Data Engineering and Data Pipeline projects☆109Updated 2 years ago
- The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for sever…☆241Updated 2 months ago
- Great Expectations Airflow operator☆162Updated last week
- Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testin…☆64Updated last year
- A dbt SQL package for ensuring documentation and test coverage, with granular control.☆123Updated 2 years ago
- Snowflake-specific utility macros for dbt projects.☆108Updated 9 months ago
- ☆43Updated 3 years ago
- Open Data Stack Projects: Examples of End to End Data Engineering Projects☆79Updated last year