KennethanCeyer / awesome-data-pipelineLinks
Awesome list for datapipeline
☆34Updated 2 years ago
Alternatives and similar repositories for awesome-data-pipeline
Users that are interested in awesome-data-pipeline are comparing it to the libraries listed below
Sorting:
- Full stack data engineering tools and infrastructure set-up☆53Updated 4 years ago
- Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work☆47Updated 2 years ago
- Apache Spark Guide☆31Updated 3 years ago
- Spark data pipeline that processes movie ratings data.☆28Updated this week
- ☆12Updated 3 years ago
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for A…☆41Updated 2 years ago
- Sample code to collect Apache Iceberg metrics for table monitoring☆27Updated 9 months ago
- Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR☆18Updated last month
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆58Updated last year
- Public source code for the Batch Processing with Apache Beam (Python) online course☆18Updated 4 years ago
- Design/Implement stream/batch architecture on NYC taxi data | #DE☆25Updated 4 years ago
- A curated list of awesome Databricks resources, including Spark☆19Updated 11 months ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆31Updated last year
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenario…☆25Updated 6 months ago
- A Snowflake GPT Demo using SqlAlchemy☆23Updated 2 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- Awesome List for Data Operations☆24Updated 4 years ago
- Sample Airflow DAGs☆62Updated 2 years ago
- ☆46Updated this week
- Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize a…☆27Updated last year
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 4 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆33Updated 3 years ago
- Apache Airflow advanced functionalities examples☆19Updated last year
- A modern ELT demo using airbyte, dbt, snowflake and dagster☆28Updated 2 years ago
- simplify working with DataHub API endpoints☆49Updated 2 months ago
- This is a repo with links to everything you'd ever want to learn about data engineering☆10Updated 6 months ago
- For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR☆66Updated 3 years ago
- DataHub on AWS demonstration resources☆10Updated 2 years ago
- New generation opensource data stack☆68Updated 3 years ago