cerndb / SparkTrainingLinks
Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/
☆17Updated 5 months ago
Alternatives and similar repositories for SparkTraining
Users that are interested in SparkTraining are comparing it to the libraries listed below
Sorting:
- Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an A…☆129Updated last month
- The Internals of PySpark☆26Updated 9 months ago
- ☆18Updated last year
- Task Metrics Explorer☆14Updated 6 years ago
- Includes notes on using Apache Spark, with drill down on Spark for Physics, how to run TPCDS on PySpark, how to create histograms with S…☆455Updated 2 weeks ago
- Rocksdb state storage implementation for Structured Streaming.☆17Updated 5 years ago
- 📚 Tech blogs & talks by companies that run Apache Flink in production☆177Updated last month
- Don't Panic. This guide will help you when it feels like the end of the world.☆29Updated last month
- Magic to help Spark pipelines upgrade☆34Updated last year
- The Internals of Delta Lake☆186Updated 9 months ago
- Repo for everything open table formats (Iceberg, Hudi, Delta Lake) and the overall Lakehouse architecture☆110Updated 4 months ago
- Code and examples of how to write and deploy Apache Spark Plugins. Spark plugins allow runnig custom code on the executors as they are in…☆93Updated 5 months ago
- A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.☆163Updated 3 weeks ago
- The Internals of Spark on Kubernetes☆72Updated 3 years ago
- On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes.☆35Updated 6 months ago
- O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian☆222Updated 2 years ago
- Examples of Spark 3.0☆46Updated 4 years ago
- Performance optimization for Spark running on Kubernetes☆89Updated 5 years ago
- ☆63Updated 5 years ago
- This project provides a reverse proxy for Spark UI on Kubernetes☆16Updated 2 years ago
- Spark-Radiant is Apache Spark Performance and Cost Optimizer☆25Updated 9 months ago
- LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as De…☆84Updated last week
- Delta Lake examples☆229Updated last year
- Sample processing code using Spark 2.1+ and Scala☆51Updated 5 years ago
- All the things about TPC-DS in Apache Spark☆107Updated 2 years ago
- Quick Guides from Dremio on Several topics☆78Updated 2 weeks ago
- A Python Library to support running data quality rules while the spark job is running⚡☆189Updated this week
- Oozie Workflow to Airflow DAGs migration tool☆88Updated 7 months ago
- ☆268Updated 11 months ago
- Parquet file generator☆22Updated 7 years ago