mistercrunch / awesome-data-engineeringLinks
A curated list of data engineering tools for software developers
☆495Updated 8 years ago
Alternatives and similar repositories for awesome-data-engineering
Users that are interested in awesome-data-engineering are comparing it to the libraries listed below
Sorting:
- Guides and docs to help you get up and running with Apache Airflow.☆815Updated last month
- ETL best practices with airflow, with examples☆1,353Updated last year
- ☆202Updated 2 years ago
- Apache Airflow integration for dbt☆412Updated last year
- Example DAGs using hooks and operators from Airflow Plugins☆347Updated 7 years ago
- Helm Charts for the Astronomer Platform, Apache Airflow as a Service on Kubernetes☆486Updated this week
- Data ingestion library for Amundsen to build graph and search index☆204Updated last year
- Airflow Unit Tests and Integration Tests☆261Updated 3 years ago
- Construct Apache Airflow DAGs Declaratively via YAML configuration files☆1,410Updated this week
- Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.☆168Updated 2 years ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Updated 10 months ago
- Airflow Backfill UI based plugin for existing / new Airflow environment☆64Updated 5 years ago
- A boilerplate for writing PySpark Jobs☆395Updated last year
- The easiest way to run Airflow locally, with linting & tests for valid DAGs and Plugins.☆257Updated 4 years ago
- Assets related to the operation of Fishtown Analytics.☆419Updated last year
- A series of DAGs/Workflows to help maintain the operation of Airflow☆1,762Updated last year
- A Helm chart to install Apache Airflow on Kubernetes☆290Updated this week
- Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform☆259Updated 2 years ago
- Airflow basics tutorial☆396Updated 4 years ago
- CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer☆429Updated this week
- Front-end service library for Amundsen☆279Updated last month
- Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.☆534Updated 3 years ago
- Python API for Deequ☆809Updated 9 months ago
- This repository has moved into https://github.com/dbt-labs/dbt-adapters☆444Updated 5 months ago
- PySpark test helper methods with beautiful error messages☆746Updated this week
- Great Expectations Airflow operator☆169Updated last month
- Spark style guide☆271Updated last year
- A docker image and kubernetes config files to run Airflow on Kubernetes☆655Updated 6 years ago
- A plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces☆326Updated 5 years ago
- BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.☆417Updated last week