JerryLead/SparkInternals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JerryLead/SparkInternals)

JerryLead / SparkInternals

Notes talking about the design and implementation of Apache Spark

☆5,361

Alternatives and similar repositories for SparkInternals

Users that are interested in SparkInternals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lw-lin / CoolplaySpark
View on GitHub
酷玩 Spark: Spark 源代码解析、Spark 类库等
☆3,475May 18, 2022Updated 4 years ago
ColZer / DigAndBuried
View on GitHub
挖坑与填坑
☆684Aug 18, 2016Updated 9 years ago
apache / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆43,690Updated this week
japila-books / apache-spark-internals
View on GitHub
The Internals of Apache Spark
☆1,547Jul 18, 2026Updated last week
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
databricks / learning-spark
View on GitHub
Example code from Learning Spark book
☆3,892Jun 30, 2026Updated 3 weeks ago
databricks / scala-style-guide
View on GitHub
Databricks Scala Coding Style Guide
☆2,805Apr 5, 2024Updated 2 years ago
JerryLead / SparkLearning
View on GitHub
Learning to write Spark examples
☆159Aug 20, 2014Updated 11 years ago
byzer-org / byzer-lang
View on GitHub
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
☆1,835May 29, 2024Updated 2 years ago
endymecy / spark-ml-source-analysis
View on GitHub
spark ml 算法原理剖析以及具体的源码实现分析
☆1,958Mar 25, 2019Updated 7 years ago
jacksu / utils4s
View on GitHub
scala、spark使用过程中，各种测试用例以及相关资料整理
☆1,082Feb 9, 2019Updated 7 years ago
apache / flink
View on GitHub
Apache Flink
☆26,216Updated this week
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
linbojin / spark-notes
View on GitHub
Deep Dive into Apache Spark 深入研读Spark源码
☆259Jan 5, 2017Updated 9 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,926Updated this week
awesome-spark / awesome-spark
View on GitHub
A curated list of awesome Apache Spark packages and resources.
☆1,885Feb 27, 2026Updated 4 months ago
Alluxio / alluxio
View on GitHub
Alluxio, data orchestration for analytics and machine learning in the cloud
☆7,211Apr 29, 2025Updated last year
flink-china / flink-training-course
View on GitHub
Flink 中文视频课程（持续更新...）
☆4,626Jun 18, 2020Updated 6 years ago
JerryLead / ApacheSparkBook
View on GitHub
☆135Jul 6, 2021Updated 5 years ago
marsishandsome / SparkSQL-Internal
View on GitHub
☆131Jan 10, 2019Updated 7 years ago
yahoo / TensorFlowOnSpark
View on GitHub
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
☆3,845Jul 10, 2023Updated 3 years ago
lw-lin / streaming-readings
View on GitHub
Streaming System 相关的论文读物
☆734Feb 12, 2022Updated 4 years ago
Angel-ML / angel
View on GitHub
A Flexible and Powerful Parameter Server for large-scale machine learning
☆6,785Jun 8, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
shijinkui / spark_study
View on GitHub
spark源码学习
☆299Jan 21, 2016Updated 10 years ago
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
summerDG / spark-code-analysis
View on GitHub
☆179Sep 3, 2017Updated 8 years ago
neoremind / kraps-rpc
View on GitHub
A RPC framework leveraging Spark RPC module
☆207Mar 13, 2019Updated 7 years ago
databricks / spark-knowledgebase
View on GitHub
Spark Knowledge Base
☆333Oct 1, 2020Updated 5 years ago
spark-notebook / spark-notebook
View on GitHub
Interactive and Reactive Data Science using Scala and Spark.
☆3,142May 16, 2023Updated 3 years ago
apache / carbondata
View on GitHub
High performance data store solution
☆1,448Jul 4, 2026Updated 3 weeks ago
high-performance-spark / high-performance-spark-examples
View on GitHub
Examples for High Performance Spark
☆532May 3, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
apache / kylin
View on GitHub
Apache Kylin
☆3,771Jul 16, 2026Updated last week
japila-books / spark-structured-streaming-internals
View on GitHub
The Internals of Spark Structured Streaming
☆420Mar 3, 2026Updated 4 months ago
cloudera / livy
View on GitHub
Livy is an open source REST interface for interacting with Apache Spark from anywhere
☆1,007Oct 5, 2022Updated 3 years ago
japila-books / spark-sql-internals
View on GitHub
The Internals of Spark SQL
☆487Jan 25, 2026Updated 6 months ago
zhisheng17 / flink-learning
View on GitHub
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Ta…
☆15,081May 6, 2026Updated 2 months ago
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
apache / zeppelin
View on GitHub
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
☆6,645Updated this week