lightcopy/parquet-index

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lightcopy/parquet-index)

lightcopy / parquet-index

Spark SQL index for Parquet tables

☆134

Alternatives and similar repositories for parquet-index

Users that are interested in parquet-index are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alibaba / SparkCube
View on GitHub
SparkCube is an open-source project for extremely fast OLAP data analysis. SparkCube is an extension of Apache Spark.
☆136Mar 6, 2023Updated 3 years ago
JerryLead / SparkProfiler
View on GitHub
Profiling Spark Applications for Performance Comparison and Diagnosis
☆16Nov 11, 2018Updated 7 years ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
SaurabhChawla100 / spark-radiant
View on GitHub
Spark-Radiant is Apache Spark Performance and Cost Optimizer
☆25Dec 31, 2024Updated last year
qubole / rubix
View on GitHub
Cache File System optimized for columnar formats and object stores
☆188Aug 11, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yaooqinn / spark-authorizer
View on GitHub
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…
☆183Apr 6, 2022Updated 4 years ago
yaooqinn / spark-history-cli
View on GitHub
CLI tool for querying Apache Spark History Server REST API
☆28Mar 22, 2026Updated 4 months ago
AbsaOSS / hyperdrive
View on GitHub
Extensible streaming ingestion pipeline on top of Apache Spark
☆47Jul 17, 2025Updated last year
cerndb / SparkPlugins
View on GitHub
Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…
☆96May 11, 2026Updated 2 months ago
apache / carbondata
View on GitHub
High performance data store solution
☆1,448Jul 4, 2026Updated 3 weeks ago
netease-bigdata / ne-spark-courseware
View on GitHub
NetEase Spark Courses
☆15Sep 4, 2018Updated 7 years ago
jihoonson / iron-arrow
View on GitHub
☆19Mar 24, 2018Updated 8 years ago
oap-project / sql-ds-cache
View on GitHub
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
☆37Jan 3, 2023Updated 3 years ago
ndolgov / experiments
View on GitHub
Code examples for my blog posts
☆22Nov 7, 2018Updated 7 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
linyiqun / open-source-patch
View on GitHub
项目中保留了向开源社区提交过的patch
☆16Oct 22, 2017Updated 8 years ago
AzimoLabs / kafka-to-avro-writer
View on GitHub
Kafka to Avro Writer based on Apache Beam. It's a generic solution that reads data from multiple kafka topics and stores it on in cloud s…
☆25Apr 7, 2021Updated 5 years ago
liancheng / brainsuck
View on GitHub
A simple optimizing Brainfuck compiler (used as the demo for my QCon Beijing 2015 talk)
☆61Sep 23, 2022Updated 3 years ago
lensesio / avro-sql
View on GitHub
Use SQL to transform your avro schema/records
☆28Jan 12, 2018Updated 8 years ago
Qihoo360 / XSQL
View on GitHub
Unified SQL Analytics Engine Based on SparkSQL
☆211Dec 5, 2022Updated 3 years ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
blaze-init / spark-blaze-extension
View on GitHub
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
☆11Apr 23, 2022Updated 4 years ago
uber / uberscriptquery
View on GitHub
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
☆65Dec 17, 2023Updated 2 years ago
oap-project / gazelle_plugin
View on GitHub
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
☆255Feb 21, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
intenthq / pucket
View on GitHub
Bucketing and partitioning system for Parquet
☆30May 22, 2018Updated 8 years ago
target / data-validator
View on GitHub
A tool to validate data, built around Apache Spark.
☆102Jun 15, 2026Updated last month
yaooqinn / multi-tenancy-spark
View on GitHub
A Fully HiveServer2-like Multi-tenancy Spark Thrift Server Supporting Impersonation and Multi-SparkContext with Ranger Authorization (GO …
☆10Jul 7, 2022Updated 4 years ago
squito / spark-memory
View on GitHub
A tool to get better debug info on spark's memory usage
☆42Aug 21, 2019Updated 6 years ago
assafmendelson / DataSourceV2
View on GitHub
☆23Oct 8, 2018Updated 7 years ago
hbutani / icebergSQL
View on GitHub
Integration of Iceberg table management into Spark SQL
☆11Jan 21, 2020Updated 6 years ago
wooplevip / sedis
View on GitHub
SQL for Redis
☆11Sep 16, 2022Updated 3 years ago
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,354Updated this week
sparsecode / DaFlow
View on GitHub
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple…
☆26Jun 7, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Tencent / Firestorm
View on GitHub
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…
☆256Apr 7, 2023Updated 3 years ago
allwefantasy / godear
View on GitHub
ServiceFramework 示例项目
☆10Apr 2, 2016Updated 10 years ago
51zero / eel-sdk
View on GitHub
Big Data Toolkit for the JVM
☆147Nov 4, 2020Updated 5 years ago
yaooqinn / spark-ranger
View on GitHub
已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.
☆59Nov 11, 2021Updated 4 years ago
jupyterhub / kerberosauthenticator
View on GitHub
A JupyterHub authenticator using Kerberos
☆12Jul 1, 2026Updated 3 weeks ago
sunchao / parquet-format-rs
View on GitHub
Apache Parquet format for Rust, hosting the Thrift definition file and the generated .rs file
☆18Jul 6, 2022Updated 4 years ago
spoddutur / cloud-based-sql-engine-using-spark
View on GitHub
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
☆30Jul 12, 2017Updated 9 years ago