mozilla / docker-etl
Collection of dockerized ETL jobs managed by data engineering.
☆20Updated this week
Alternatives and similar repositories for docker-etl:
Users that are interested in docker-etl are comparing it to the libraries listed below
- ETL jobs for Firefox Telemetry☆27Updated last week
- LookML Generator for Glean and Mozilla Data☆20Updated this week
- Apache Airflow CI pipeline☆19Updated 5 years ago
- End-to-end DataOps platform deployed by Terraform.☆66Updated last month
- Documentation and implementation of telemetry ingestion on Google Cloud Platform☆83Updated last week
- Sample code with integration between Data Catalog and Hive data source.☆25Updated 3 months ago
- [ARCHIVED] The Presto adapter plugin for dbt Core☆33Updated last year
- Airflow configuration for Telemetry☆186Updated last week
- Utility functions for dbt projects running on Spark☆33Updated 2 months ago
- PySpark data-pipeline testing and CICD☆28Updated 4 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-orchestration-airflow☆15Updated last year
- Pylint plugin for static code analysis on Airflow code☆94Updated 4 years ago
- ☆47Updated last year
- Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.☆55Updated 2 weeks ago
- A Python package to centralize some Google Cloud Data Catalog scripts, this repo contains commands like bulk CSV operations that help lev…☆22Updated 2 years ago
- Sample code with integration between Data Catalog and RDBMS data sources.☆72Updated 3 years ago
- event-triggered plugins for airflow☆21Updated 5 years ago
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-datacatalog☆52Updated last year
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- Weekly Data Engineering Newsletter☆94Updated 9 months ago
- A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran☆22Updated last week
- This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-iam☆37Updated last year
- ☆24Updated 5 years ago
- Delta Lake helper methods. No Spark dependency.☆23Updated 8 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated last week
- a pytest plugin for dbt adapter test suites☆19Updated last year
- Examples for High Performance Spark☆15Updated 6 months ago
- A bunch of hacks developed around dbt☆48Updated 5 years ago