BlueGranite / tpc-ds-dataset-generator
Generate big TPC-DS datasets with Databricks
☆18Updated 3 years ago
Alternatives and similar repositories for tpc-ds-dataset-generator:
Users that are interested in tpc-ds-dataset-generator are comparing it to the libraries listed below
- Databricks Migration Tools☆43Updated 3 years ago
- TPCDS benchmark for various engines☆18Updated 3 years ago
- This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.☆44Updated last month
- An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset☆108Updated last year
- Yet Another (Spark) ETL Framework☆20Updated last year
- A Spark datasource for the HadoopOffice library☆38Updated 2 years ago
- Make your libraries magically appear in Databricks.☆47Updated last year
- Splittable SAS (.sas7bdat) Input Format for Hadoop and Spark SQL☆90Updated last year
- Flowchart for debugging Spark applications☆105Updated 5 months ago
- A tool to validate data, built around Apache Spark.☆101Updated last week
- A library that brings useful functions from various modern database management systems to Apache Spark☆58Updated last year
- Databricks Implementation of the TPC-DI Specification using Traditional Notebooks and/or Delta Live Tables☆81Updated 2 weeks ago
- dbt adapter for Azure Synapse Dedicated SQL Pools☆70Updated 4 months ago
- Azure Deployments using Terraform☆30Updated 2 years ago
- Spark and Delta Lake Workshop☆22Updated 2 years ago
- Monitoring Azure Databricks jobs☆222Updated 5 months ago
- Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs☆235Updated last month
- ☆76Updated 9 months ago
- An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…☆10Updated 2 years ago
- A proof of concept of how to integrate Spark Lineage in Azure Purview☆22Updated 4 years ago
- Databricks Platform - Architecture, Security, Automation and much more!!☆50Updated 2 weeks ago
- Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline☆150Updated 7 months ago
- The Internals of Spark on Kubernetes☆70Updated 2 years ago
- Sample processing code using Spark 2.1+ and Scala☆51Updated 4 years ago
- This project provides a client library that allows Azure SQL DB or SQL Server to act as an input source or output sink for Spark jobs.☆75Updated 4 years ago
- End-to-end Machine Learning Pipeline demo using Delta Lake, MLflow and AzureML in Azure Databricks☆18Updated 5 years ago
- Custom PySpark Data Sources☆40Updated 2 months ago
- Delta lake and filesystem helper methods☆51Updated last year
- type-class based data cleansing library for Apache Spark SQL☆78Updated 5 years ago
- Cask Hydrator Plugins Repository☆68Updated this week