fithisux / experiment-with-trino-minio-hiveLinks
☆13Updated last year
Alternatives and similar repositories for experiment-with-trino-minio-hive
Users that are interested in experiment-with-trino-minio-hive are comparing it to the libraries listed below
Sorting:
- Yet Another (Spark) ETL Framework☆21Updated last year
- NiFi Processor for Apache Pulsar☆10Updated 7 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated this week
- A Table format agnostic data sharing framework☆38Updated last year
- FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...☆21Updated this week
- This repository contains recipes for Apache Pinot.☆30Updated 3 months ago
- Code that was used as an example during the Data+AI Summit 2020☆15Updated 4 years ago
- Unity Catalog UI☆40Updated 9 months ago
- ☆22Updated 3 months ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated last year
- Delta Lake Documentation☆49Updated last year
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.☆15Updated 7 months ago
- Scalable CDC Pattern Implemented using PySpark☆18Updated 5 years ago
- A write-audit-publish implementation on a data lake without the JVM☆46Updated 10 months ago
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Updated last year
- ☆12Updated 3 years ago
- ☆18Updated last year
- learning-by-doing data model built with dbt-core☆13Updated 6 months ago
- A platform to manage the data product life cycle☆17Updated this week
- Data validation library for PySpark 3.0.0☆33Updated 2 years ago
- A project for exploring how Great Expectations can be used to ensure data quality and validate batches within a data pipeline defined in …☆21Updated 2 years ago
- Examples for High Performance Spark☆16Updated 7 months ago
- Data Catalog for Databases and Data Warehouses☆35Updated last year
- DataOps Observability is part of DataKitchen's Open Source Data Observability. DataOps Observability monitors every data journey from da…☆46Updated 3 weeks ago
- Utility functions for dbt projects running on Spark☆34Updated 4 months ago
- This repo demonstrates how to load a sample Parquet formatted file from an AWS S3 Bucket. A python job will then be submitted to a Apach…☆19Updated 9 years ago
- Nested Data (JSON/AVRO/XML) Parsing and Flattening in Spark☆16Updated last year
- ☆18Updated 10 months ago
- Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb☆21Updated last year