cerndb / SparkTrainingLinks
Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
☆18Updated 8 months ago
Alternatives and similar repositories for SparkTraining
Users that are interested in SparkTraining are comparing it to the libraries listed below
Sorting:
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 5 years ago
- Magic to help Spark pipelines upgrade☆34Updated last year
- This project provides a reverse proxy for Spark UI on Kubernetes☆17Updated 2 years ago
- Task Metrics Explorer☆14Updated 6 years ago
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆133Updated last month
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆88Updated 3 months ago
- Performance optimization for Spark running on Kubernetes☆88Updated 5 years ago
- The Internals of Delta Lake☆187Updated 2 months ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆94Updated 9 months ago
- The Internals of PySpark☆27Updated last year
- Edge2AI Workshop☆70Updated 7 months ago
- Don't Panic. This guide will help you when it feels like the end of the world.☆30Updated this week
- 📚 Tech blogs & talks by companies that run Apache Flink in production☆188Updated 2 months ago
- ☆110Updated last year
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆457Updated last month
- The Internals of Spark on Kubernetes☆72Updated 3 years ago
- Oozie Workflow to Airflow DAGs migration tool☆90Updated last month
- A Python Library to support running data quality rules while the spark job is running⚡☆197Updated last week
- Orchestrate Spark Jobs from Kubeflow Pipelines and poll for the status.☆53Updated 3 years ago
- ☆63Updated 6 years ago
- Examples of Spark 3.0☆45Updated 5 years ago
- Flowchart for debugging Spark applications☆106Updated last year
- ☆27Updated 5 years ago
- ☆27Updated 2 years ago
- Ambari stack service for installing and managing Apache Airflow on HDP cluster☆58Updated 7 years ago
- A general purpose framework for automating Cloudera Products☆69Updated 11 months ago
- Playground for Lakehouse (Iceberg, Hudi, Spark, Flink, Trino, DBT, Airflow, Kafka, Debezium CDC)☆64Updated 2 years ago
- Code snippets used in demos recorded for the blog.☆37Updated 3 weeks ago
- ☆32Updated last week
- Delta Lake examples☆238Updated last year