mozilla / docker-etlLinks
Collection of dockerized ETL jobs managed by data engineering.
☆21Updated last week
Alternatives and similar repositories for docker-etl
Users that are interested in docker-etl are comparing it to the libraries listed below
Sorting:
- Utility functions for dbt projects running on Spark☆34Updated last month
- Weekly Data Engineering Newsletter☆96Updated last year
- Delta Lake examples☆238Updated last year
- PySpark schema generator☆43Updated 2 years ago
- The Picnic Data Vault framework.☆130Updated 3 weeks ago
- A DBT package to perform DataOps & administrative CI/CD on your data warehouse.☆16Updated 4 years ago
- Full stack data engineering tools and infrastructure set-up☆57Updated 4 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆78Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Updated 3 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆115Updated this week
- New generation opensource data stack☆76Updated 3 years ago
- Delta Lake Documentation☆53Updated last year
- A bunch of hacks developed around dbt☆48Updated 6 years ago
- Data Tools Subjective List☆89Updated 2 years ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆32Updated 2 years ago
- ☆23Updated 4 years ago
- The go to demo for public and private dbt Learn☆81Updated 10 months ago
- Delta Lake helper methods. No Spark dependency.☆22Updated 2 weeks ago
- Make simple storing test results and visualisation of these in a BI dashboard☆52Updated last month
- Palm CLI - the tool-belt for data teams☆47Updated last year
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- Supporting materials/code examples for my course in data engineering for machine learning.☆39Updated 3 years ago
- A guide for leading a data (engineering) team☆64Updated last year
- Pipeline definitions for managing data flows to power analytics at MIT Open Learning☆45Updated this week
- Data-aware orchestration with dagster, dbt, and airbyte☆31Updated 3 years ago
- Great Expectations Airflow operator☆170Updated last week
- A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.☆83Updated last year
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes☆63Updated 3 years ago
- An open specification for data products in Data Mesh☆63Updated 4 months ago