tubular / sparklyView external linksLinks
Helpers & syntactic sugar for PySpark.
☆62Dec 4, 2025Updated 2 months ago
Alternatives and similar repositories for sparkly
Users that are interested in sparkly are comparing it to the libraries listed below
Sorting:
- Collect and aggregate on spark events for profitz☆10Apr 22, 2022Updated 3 years ago
- Asynchronous actions for PySpark☆48Dec 2, 2021Updated 4 years ago
- A boilerplate for writing PySpark Jobs☆395Jan 21, 2024Updated 2 years ago
- Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.☆20Jan 11, 2018Updated 8 years ago
- Load data in BigQuery using Cloud Workflows, Firestore and Cloud Functions.☆12May 12, 2021Updated 4 years ago
- Apache (Py)Spark type annotations (stub files).☆118Aug 17, 2022Updated 3 years ago
- PySpark for ETL jobs including lineage to Apache Atlas in one script via code inspection☆18Jan 12, 2017Updated 9 years ago
- Record matching and entity resolution at scale in Spark☆36Oct 31, 2023Updated 2 years ago
- A low-overhead sampling profiler for PySpark, that outputs Flame Graphs☆16Dec 17, 2020Updated 5 years ago
- sparkql: Apache Spark SQL DataFrame schema management for sensible humans☆12Sep 18, 2023Updated 2 years ago
- HADOOP-CLI is an interactive command line shell that makes interacting with the Hadoop Distribted Filesystem (HDFS) simpler and more intu…☆36Feb 2, 2026Updated last week
- A library that provides useful extensions to Apache Spark and PySpark.☆232Jan 20, 2026Updated 3 weeks ago
- pyspark methods to enhance developer productivity 📣 👯 🎉☆682Mar 6, 2025Updated 11 months ago
- Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks☆360Jun 6, 2017Updated 8 years ago
- Asynchronous message queue consumer and scheduler☆59Dec 15, 2017Updated 8 years ago
- A Scalable Data Cleaning Library for PySpark.☆29Apr 4, 2019Updated 6 years ago
- A tool for anomaly detection over streaming data based on sentiment analysis☆30Jul 2, 2018Updated 7 years ago
- Capture changes of HBase to Kafka☆30May 3, 2016Updated 9 years ago
- Coding exercises for Apache Spark☆104Jun 4, 2015Updated 10 years ago
- A CLI to manage and monitor permissions in AWS Lake Formation☆25Feb 8, 2023Updated 3 years ago
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆458Dec 15, 2025Updated 2 months ago
- Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark☆1,541Dec 2, 2024Updated last year
- ☆10Jun 29, 2021Updated 4 years ago
- Spark style guide☆272Sep 30, 2024Updated last year
- A COBOL parser and Mainframe/EBCDIC data source for Apache Spark☆160Updated this week
- C++ native client for Impala and Hive, with Python / pandas bindings☆72Aug 15, 2018Updated 7 years ago
- Bulletproof Apache Spark jobs with fast root cause analysis of failures.☆73Mar 14, 2021Updated 4 years ago
- A simplified version of featuretools for Spark☆31Jun 14, 2019Updated 6 years ago
- A curated list of awesome Apache Spark packages and resources.☆1,861Oct 24, 2024Updated last year
- Python Package to Share/Edit Pandas/Polars DF with web interface!☆11Jun 10, 2025Updated 8 months ago
- Apache DataLab (incubating)☆152Oct 3, 2023Updated 2 years ago
- Data validation library for PySpark 3.0.0☆33Nov 11, 2022Updated 3 years ago
- Common utilities for Apache Kafka☆36Aug 7, 2023Updated 2 years ago
- DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities☆169Nov 20, 2019Updated 6 years ago
- This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spa…☆811Feb 5, 2026Updated last week
- Jupyter magics and kernels for working with remote Spark clusters☆1,363Sep 9, 2025Updated 5 months ago
- A simple elasticsearch frontend for serving astrophysical simulation catalog data☆10Aug 29, 2025Updated 5 months ago
- OpenTelemetry layer for HTTP/gRPC services☆10Feb 4, 2026Updated last week
- How to customize Tableau authentication using the AWS Athena's JDBC Credentials Provider capabilites.☆14Jun 8, 2020Updated 5 years ago