monksy / awesome-data-engineering
A curated list of data engineering tools for software developers
☆10Updated 6 years ago
Alternatives and similar repositories for awesome-data-engineering:
Users that are interested in awesome-data-engineering are comparing it to the libraries listed below
- A curated list of awesome Databricks resources, including Spark☆17Updated 9 months ago
- Supplementary material for Building a Modern Data Platform with Snowflake, from Pearson.☆21Updated 3 years ago
- Example project using DBT, Databricks and AdventureWorks sample database☆11Updated 2 years ago
- Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered…☆16Updated 5 years ago
- https://www.packtpub.com/books/info/authors/tomasz-lelek☆12Updated 3 years ago
- Daily-updated reading list for designing High Scalability , High Availability , High Stability back-end systems - Pull requests are gre…☆14Updated 2 years ago
- Connect DBVisualizer to Hortonwork HiveServer2☆9Updated 10 years ago
- Hadoop/Hive/Spark container to perform CI tests☆11Updated 4 years ago
- Road to Azure Data Engineer Part-II: DP-201 - Designing an Azure Data Solution☆19Updated 4 years ago
- This workshop teaches how Apache Kafka works and how you can use it to build applications that react to events as they happen.☆12Updated 3 years ago
- Recommender System (Java, Apache Spark)☆9Updated 6 years ago
- Apache Kafka 1.0 Cookbook, published by Packt☆21Updated 2 years ago
- A sample project for KSQL along with debezium and kafka connect☆15Updated 2 years ago
- This project describes how to write full ETL data pipeline using spark.☆15Updated 2 years ago
- Stream Processing Workshop☆22Updated 8 months ago
- Fundamentals of Apache Flink [video], published by Packt☆12Updated 2 years ago
- Apache Spark Interview Question and Answers☆20Updated 4 years ago
- Learning ElasticSearch 6 [video], published by Packt☆14Updated 4 years ago
- Code repository for Elasticsearch 5.x Cookbook Third Edition, published by Packt☆18Updated 4 years ago
- Labs and data files for a full-day Spark workshop☆24Updated last year
- Pipeline library for StreamSets Data Collector and Transformer☆33Updated 2 years ago
- Awesome list of dataops products, open source and resources☆24Updated 2 years ago
- Example Python and R code for Cloudera Machine Learning (CML) training☆14Updated 4 years ago
- Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink☆31Updated 2 years ago
- Collection of Databricks and Jupyter Notebooks☆21Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆54Updated last year
- Hands-On Microservices – Monitoring and Testing, Published by Packt☆18Updated 2 years ago
- Cloud Native Architectures, published by Packt☆31Updated 2 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- A boilerplate project for Azure Big Data PaaS services☆14Updated 2 years ago