maropu/spark-tpcds-datagen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/maropu/spark-tpcds-datagen)

maropu / spark-tpcds-datagen

All the things about TPC-DS in Apache Spark

☆111

Alternatives and similar repositories for spark-tpcds-datagen

Users that are interested in spark-tpcds-datagen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
IBM / spark-tpc-ds-performance-test
View on GitHub
Use the TPC-DS benchmark to test Spark SQL performance
☆186Apr 27, 2020Updated 6 years ago
gregrahn / tpcds-kit
View on GitHub
TPC-DS benchmark kit with some modifications/fixes
☆364Apr 16, 2024Updated 2 years ago
apache / kyuubi-docker
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆16May 22, 2026Updated 2 months ago
squito / spark-memory
View on GitHub
A tool to get better debug info on spark's memory usage
☆42Aug 21, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
oap-project / Gluten-Trino
View on GitHub
Gluten: Plugin to Boost Trino's Performance
☆75Oct 25, 2023Updated 2 years ago
dhiraa / spark-tpcds
View on GitHub
Apache Spark TPC-DS benchmark setup with EMR launch setup
☆18Jul 11, 2022Updated 4 years ago
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
valuko / TPCx-BB
View on GitHub
Source code for TPCx-BB benchmark for Hive and SparkSQL on scale factor of 300 GB
☆10Jun 26, 2018Updated 8 years ago
hortonworks-spark / cloud-integration
View on GitHub
Spark cloud integration: tests, cloud committers and more
☆20Jan 30, 2025Updated last year
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,059Updated this week
databricks / tpcds-kit
View on GitHub
TPC-DS benchmark kit with some modifications/fixes
☆107Aug 13, 2024Updated last year
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,578Updated this week
trinodb / tpcds
View on GitHub
Port of TPC-DS dsdgen to Java
☆22Jun 23, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hortonworks / hive-testbench
View on GitHub
☆391Jan 25, 2024Updated 2 years ago
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
ssavvides / tpch-spark
View on GitHub
TPC-H queries in Apache Spark SQL using native DataFrames API
☆99Jan 24, 2024Updated 2 years ago
apache / spark-connect-swift
View on GitHub
Apache Spark Connect Client for Swift
☆31Updated this week
JonathanMace / tpcds
View on GitHub
TPC-DS benchmarks including data generation with Spark and queries with Spark
☆15May 8, 2017Updated 9 years ago
aws-samples / eks-spark-benchmark
View on GitHub
Performance optimization for Spark running on Kubernetes
☆87Aug 18, 2020Updated 5 years ago
apache / kyuubi-client
View on GitHub
Client libraries of end users of Apache Kyuubi
☆11May 15, 2026Updated 2 months ago
hortonworks-spark / spark-schema-registry
View on GitHub
Schema Registry integration for Apache Spark
☆40Nov 16, 2022Updated 3 years ago
alibaba / SparkCube
View on GitHub
SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
☆136Mar 6, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
qubole / spark-state-store
View on GitHub
Rocksdb state storage implementation for Structured Streaming.
☆17Oct 21, 2020Updated 5 years ago
apache / kyuubi-website
View on GitHub
Apache Kyuubi Site
☆13Jun 12, 2026Updated last month
ibm-research-ireland / sparkoscope
View on GitHub
Enabling Spark Optimization through Cross-stack Monitoring and Visualization
☆47Aug 23, 2017Updated 8 years ago
ehiggs / spark-terasort
View on GitHub
Spark Terasort
☆121Apr 21, 2023Updated 3 years ago
cloudera / impala-tpcds-kit
View on GitHub
TPC-DS Kit for Impala
☆170May 20, 2024Updated 2 years ago
zrlio / albis
View on GitHub
Albis: High-Performance File Format for Big Data Systems
☆21Jul 12, 2018Updated 8 years ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
Kyligence / ClickHouse
View on GitHub
ClickHouse® is a free analytics DBMS for big data
☆16May 13, 2026Updated 2 months ago
apache / orc-format
View on GitHub
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
☆16May 15, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Azure / spark-cdm
View on GitHub
A Spark connector for the Azure Common Data Model
☆15May 31, 2023Updated 3 years ago
apache / datafusion-comet
View on GitHub
Apache DataFusion Comet Spark Accelerator
☆1,233Updated this week
ldbc / dbgen.JCC-H
View on GitHub
☆22Apr 17, 2024Updated 2 years ago
apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,780Updated this week
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
fdeantoni / spark-websocket-datasource
View on GitHub
A sample custom Spark Structured Streaming Datasource with Websockets
☆12May 14, 2020Updated 6 years ago
maropu / spark-sql-flow-plugin
View on GitHub
Visualize column-level data lineage in Spark SQL
☆92May 13, 2022Updated 4 years ago