KennethanCeyer / awesome-data-pipelineLinks
Awesome list for datapipeline
β34Updated 2 years ago
Alternatives and similar repositories for awesome-data-pipeline
Users that are interested in awesome-data-pipeline are comparing it to the libraries listed below
Sorting:
- Apache Spark Guideβ31Updated 3 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data πβ33Updated 3 years ago
- Apache Airflow Guideβ28Updated last year
- A curated list of awesome DataOps toolsβ191Updated this week
- Full stack data engineering tools and infrastructure set-upβ53Updated 4 years ago
- Data Tools Subjective Listβ86Updated last year
- Awesome list of dataops products, open source and resourcesβ24Updated 3 years ago
- New generation opensource data stackβ70Updated 3 years ago
- dlt-dagster-demoβ11Updated last year
- A curated list of awesome Databricks resources, including Sparkβ20Updated last year
- a collection of resources and blogs about Apache Supersetβ83Updated 3 years ago
- A curated list of dagster code snippets for data engineersβ56Updated last year
- Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,β¦β29Updated last month
- Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Aβ¦β41Updated 3 years ago
- A curated list of awesome blogs, videos, tools and resources about Data Contractsβ177Updated 11 months ago
- A curated list of awesome open source and commercial platforms for serving models in production πβ39Updated 3 years ago
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Pythonβ44Updated 2 years ago
- A curated list of awesome open source tools and commercial products that will help you manage machine learning and data-science workflowsβ¦β24Updated 3 years ago
- A curated list of banking technologies and resourcesβ22Updated 2 years ago
- Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team β¦β121Updated last week
- dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.β57Updated 3 years ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.β59Updated this week
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)β59Updated last year
- Open Data Stack Projects: Examples of End to End Data Engineering Projectsβ86Updated 2 years ago
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.β31Updated last year
- Fivetran data models for QuickBooks using dbt.β34Updated this week
- β54Updated last week
- This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarioβ¦β25Updated 2 weeks ago
- Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observβ¦β154Updated this week
- Open Source Data Quality Monitoring.β156Updated this week