monksy / awesome-data-engineering
A curated list of data engineering tools for software developers
☆10Updated 6 years ago
Alternatives and similar repositories for awesome-data-engineering:
Users that are interested in awesome-data-engineering are comparing it to the libraries listed below
- A curated list of awesome Databricks resources, including Spark☆16Updated 7 months ago
- Apache Kafka Guide☆30Updated 3 years ago
- A sample project for KSQL along with debezium and kafka connect☆15Updated 2 years ago
- Awesome developing: ideas for how to create better software code and collaboration☆31Updated last year
- Example project using DBT, Databricks and AdventureWorks sample database☆11Updated 2 years ago
- A boilerplate project for Azure Big Data PaaS services☆14Updated 2 years ago
- Supplementary material for Building a Modern Data Platform with Snowflake, from Pearson.☆21Updated 3 years ago
- Data engineering interviews Q&A for data community by data community☆63Updated 4 years ago
- Realistic Data Generation tool for Big Data Appliances and AI Solutions☆13Updated 2 years ago
- Apache Spark Interview Question and Answers☆21Updated 4 years ago
- https://www.packtpub.com/books/info/authors/tomasz-lelek☆12Updated 3 years ago
- Critical Success Factor (CSF) tutorial☆18Updated last year
- AWS Big Data Certification☆25Updated last month
- Leadership and management ideas☆36Updated 2 months ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- Daily-updated reading list for designing High Scalability , High Availability , High Stability back-end systems - Pull requests are gre…☆14Updated 2 years ago
- Pipeline library for StreamSets Data Collector and Transformer☆32Updated 2 years ago
- This is a basic Apache Pinot example for ingesting real-time MySQL change logs using Debezium☆27Updated 4 years ago
- DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics control framework that can be used to monitor, log, aud…☆25Updated last month
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 4 years ago
- Apache Airflow Guide☆27Updated 9 months ago
- Showcase of some basic Kafka concepts and their integration with Spring Boot☆9Updated 4 years ago
- Git flow help: research on Git flow, GitHub flow, GitLab flow, etc.☆28Updated last year
- Code for Apache Hudi, Apache Iceberg and Delta Lake analysis☆9Updated last year
- Notes from 100 days with Kubernetes☆30Updated 6 years ago
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆53Updated last year
- ☆12Updated 3 years ago
- ☆11Updated 7 years ago
- Apache Spark based framework for analysis A/B experiments☆13Updated 3 months ago