mneedham / pinot-wikiLinks
☆20Updated 2 years ago
Alternatives and similar repositories for pinot-wiki
Users that are interested in pinot-wiki are comparing it to the libraries listed below
Sorting:
- Streaming Synthetic Sales Data Generator: Streaming sales data generator for Apache Kafka, written in Python☆44Updated 2 years ago
- Materials of the Official Helm Chart Webinar☆27Updated 4 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆28Updated 3 years ago
- Apache Flink (Pyflink) and Related Projects☆40Updated 3 months ago
- ☆18Updated last year
- Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validatio…☆55Updated 2 years ago
- Resources for video demonstrations and blog posts related to DataOps on AWS☆178Updated 3 years ago
- spark on kubernetes☆104Updated 2 years ago
- New generation opensource data stack☆70Updated 3 years ago
- ☆58Updated 11 months ago
- A general purpose framework for automating Cloudera Products☆67Updated 4 months ago
- ☆91Updated 6 months ago
- Trino dbt demo project to mix and load BigQuery data with and in a local PostgreSQL database☆75Updated 3 years ago
- Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data☆47Updated last year
- The Python fake data producer for Apache Kafka® is a complete demo app allowing you to quickly produce JSON fake streaming datasets and …☆85Updated last year
- Delta-Lake, ETL, Spark, Airflow☆47Updated 2 years ago
- Pyspark boilerplate for running prod ready data pipeline☆29Updated 4 years ago
- A repository of sample code to show data quality checking best practices using Airflow.☆77Updated 2 years ago
- Data Engineering with Spark and Delta Lake☆101Updated 2 years ago
- Apache Spark 3 - Structured Streaming Course Material☆121Updated last year
- Dockerizing an Apache Spark Standalone Cluster☆43Updated 3 years ago
- Repo for all my code on the articles I post on medium☆107Updated 2 years ago
- Grafana dashboards and StatsD exporter config for Airflow monitoring☆282Updated last year
- ETL pipeline using pyspark (Spark - Python)☆117Updated 5 years ago
- Tutorial for setting up a Spark cluster running inside of Docker containers located on different machines☆133Updated 2 years ago
- Fully reproducible, Dockerized, step-by-step, tutorial on how to mock a "real-time" Kafka data stream from a timestamped csv file. Detai…☆40Updated 3 years ago
- EverythingApacheNiFi☆113Updated last year
- Docker with Airflow and Spark standalone cluster☆261Updated last year
- To provide a deeper understanding of how the modern, open-source data stack consisting of Iceberg, dbt, Trino, and Hive operates within a…☆37Updated last year
- Code snippets for Data Engineering Design Patterns book☆128Updated 3 months ago