mistercrunch / awesome-data-engineeringLinks
A curated list of data engineering tools for software developers
☆497Updated 8 years ago
Alternatives and similar repositories for awesome-data-engineering
Users that are interested in awesome-data-engineering are comparing it to the libraries listed below
Sorting:
- ETL best practices with airflow, with examples☆1,354Updated last year
- ☆201Updated 2 years ago
- Apache Airflow integration for dbt☆411Updated last year
- Airflow Unit Tests and Integration Tests☆261Updated 3 years ago
- Example DAGs using hooks and operators from Airflow Plugins☆348Updated 7 years ago
- Assets related to the operation of Fishtown Analytics.☆419Updated last year
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Updated 10 months ago
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆534Updated this week
- Construct Apache Airflow DAGs Declaratively via YAML configuration files☆1,413Updated last week
- Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes☆488Updated this week
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆443Updated 6 months ago
- The easiest way to run Airflow locally, with linting & tests for valid DAGs and Plugins.☆258Updated 4 years ago
- Airflow basics tutorial☆397Updated 4 years ago
- Collection of dbt Tips and Tricks☆399Updated 3 years ago
- A boilerplate for writing PySpark Jobs☆395Updated 2 years ago
- Python API for Deequ☆809Updated last week
- PySpark test helper methods with beautiful error messages☆750Updated 2 weeks ago
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆259Updated 2 years ago
- Airflow training for the crunch conf☆105Updated 7 years ago
- Spark style guide☆272Updated last year
- Fast iterative local development and testing of Apache Airflow workflows☆203Updated last month
- Front-end service library for Amundsen☆278Updated last week
- A guide to running Airflow on Kubernetes☆174Updated 6 years ago
- A series of DAGs/Workflows to help maintain the operation of Airflow☆1,763Updated last year
- Redshift package for dbt (getdbt.com)☆102Updated last year
- Performant Redshift data source for Apache Spark☆141Updated 2 weeks ago
- A curated collection of publicly available resources on dbt best practices and how data-driven organizations around the world utilize dbt☆115Updated 3 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆420Updated last week