Apache Spark is an open-source, distributed computing system designed for big data processing and analytics, offering an alternative to traditional MapReduce models with improved performance and ease of use. It provides a unified analytics engine capable of handling large-scale data processing tasks efficiently by leveraging in-memory computation and a resilient distributed dataset (RDD) framework. Spark supports a variety of programming languages such as Java, Scala, Python, and R, enabling developers to build sophisticated data pipelines with ease. Additionally, it integrates seamlessly with a variety of data sources like Hadoop Distributed File System (HDFS), Apache HBase, and Apache Cassandra, among others. The ecosystem includes libraries for SQL (Spark SQL), streaming data (Spark Streaming), machine learning (MLlib), and graph processing (GraphX), making it highly versatile in addressing a wide range of data processing and analytic workloads across various industries. For application developers, Apache Spark provides the tools to scale applications efficiently and handle complex data transformations, ultimately enabling faster insights from data-intensive applications.
View the most prominent open source Apache Spark projects in the list below. Click on a specific project to view its alternative or complementary packages. Make comparisons and find the best package for your app.
- Apache Spark - A unified analytics engine for large-scale data processing☆40,989Updated this week
- Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce,…☆28,124Updated last year
- Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.☆30,188Updated last week
- Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.☆27,236Updated last week
- Learn and understand Docker&Container technologies, with real DevOps practice!☆25,351Updated 4 months ago
- 大数据入门指南☆16,360Updated last year
- GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.☆15,423Updated last month
- List of Data Science Cheatsheets to rule the world☆15,181Updated 9 months ago
- flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Ta…☆14,747Updated last month
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,449Updated this week
- 【大厂面试专栏】一份Java程序员需要的技术指南,这里有面试题、系统架构、职场锦囊、主流中间件等,让你成为更牛的自己!