fithisux / experiment-with-trino-minio-hiveLinks
☆13Updated last year
Alternatives and similar repositories for experiment-with-trino-minio-hive
Users that are interested in experiment-with-trino-minio-hive are comparing it to the libraries listed below
Sorting:
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated 2 weeks ago
- Delta lake and filesystem helper methods☆51Updated last year
- A platform and cloud-based service for data sharing based on the Delta Sharing protocol.☆21Updated last year
- A Table format agnostic data sharing framework☆38Updated last year
- Personal Finance Project to automatically collect swiss banking transaction into a DWH and visualise it☆26Updated last year
- A write-audit-publish implementation on a data lake without the JVM☆46Updated last year
- Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....☆76Updated last week
- Yet Another (Spark) ETL Framework☆21Updated last year
- Fake Pandas / PySpark DataFrame creator☆48Updated last year
- Delta Lake Documentation☆49Updated last year
- A platform to manage the data product life cycle☆20Updated this week
- Run Apache Airflow on OpenShift☆14Updated 4 years ago
- How to evaluate the Quality of your Data with Great Expectations and Spark.☆31Updated 2 years ago
- Delta reader for the Ray open-source toolkit for building ML applications☆46Updated last year
- This repository contains recipes for Apache Pinot.☆30Updated 6 months ago
- FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...☆22Updated this week
- SQL query executor on remote DuckDB instance using Apache Arrow Flight RPC through Streamlit Web interface.☆17Updated 10 months ago
- Unity Catalog UI☆42Updated 11 months ago
- Code examples for the Introduction to Kubeflow course☆14Updated 4 years ago
- Code that was used as an example during the Data+AI Summit 2020☆15Updated 4 years ago
- Data Catalog for Databases and Data Warehouses☆35Updated last year
- This repository contains the tpcds queries together with the code required to run this benchmark for dbt and duckdb☆18Updated last year
- Full stack data engineering tools and infrastructure set-up☆56Updated 4 years ago
- Utility functions for dbt projects running on Spark☆33Updated 6 months ago
- DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability. DataOps TestGen delivers simple, fast data qualit…☆61Updated last week
- Writing PySpark logs in Apache Spark and Databricks☆17Updated 3 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆60Updated 2 years ago
- Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.☆133Updated this week
- Test data management tool for any data source, batch or real-time. Generate, validate and clean up data all in one tool.☆68Updated 2 weeks ago
- Cost Efficient Data Pipelines with DuckDB☆57Updated 3 months ago