japila-books / pyspark-internalsLinks
The Internals of PySpark
☆26Updated 5 months ago
Alternatives and similar repositories for pyspark-internals
Users that are interested in pyspark-internals are comparing it to the libraries listed below
Sorting:
- A temporary home for LinkedIn's changes to Apache Iceberg (incubating)☆61Updated 6 months ago
- The Internals of Spark on Kubernetes☆71Updated 3 years ago
- Examples of Spark 3.0☆47Updated 4 years ago
- Friendly ML feature store☆45Updated 3 years ago
- Presto Trino with Apache Hive Postgres metastore☆42Updated 9 months ago
- Magic to help Spark pipelines upgrade☆35Updated 8 months ago
- The Internals of Delta Lake☆184Updated 5 months ago
- Example for article Running Spark 3 with standalone Hive Metastore 3.0☆99Updated 2 years ago
- A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino☆21Updated 3 years ago
- Flowchart for debugging Spark applications☆105Updated 8 months ago
- ☆58Updated 10 months ago
- ACID Data Source for Apache Spark based on Hive ACID☆97Updated 3 years ago
- A simple Spark-powered ETL framework that just works 🍺☆181Updated last month
- Data Sketches for Apache Spark☆22Updated 2 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 5 months ago
- ☆63Updated 5 years ago
- A library that provides useful extensions to Apache Spark and PySpark.☆225Updated 3 months ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 4 years ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆83Updated this week
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆125Updated 3 weeks ago
- ☆40Updated 2 years ago
- Delta Lake helper methods. No Spark dependency.☆23Updated 9 months ago
- ☆31Updated 5 years ago
- A library that brings useful functions from various modern database management systems to Apache Spark☆59Updated last year
- The Workload Analyzer collects Presto® and Trino workload statistics, and analyzes them☆135Updated last year
- LinkedIn's version of Apache Calcite☆23Updated last month
- ☆24Updated 4 years ago
- ☆24Updated 9 months ago
- Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.☆29Updated this week
- Trino Connector for Apache Paimon.☆35Updated last week