apache / spark
Apache Spark - A unified analytics engine for large-scale data processing
☆39,296Updated this week
Related projects: ⓘ
- Apache Hadoop☆14,645Updated this week
- Apache Flink☆23,838Updated this week
- Mirror of Apache Kafka☆28,379Updated this week
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflows☆36,304Updated this week
- The official home of the Presto distributed SQL query engine for big data☆15,919Updated this week
- scikit-learn: machine learning in Python☆59,439Updated this week
- Free and Open, Distributed, RESTful Search Engine☆69,528Updated this week
- Apache Hive☆5,490Updated this week
- Apache Superset is a Data Visualization and Data Exploration Platform☆61,768Updated this week
- Production-Grade Container Scheduling and Management☆109,853Updated this week
- Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on sing…☆26,107Updated this week
- Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects,…☆43,263Updated this week
- Apache Druid: a high performance real-time analytics database.☆13,405Updated this week
- ClickHouse® is a real-time analytics DBMS☆36,728Updated this week
- Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strin…☆66,305Updated this week
- Apache Beam is a unified programming model for Batch and Streaming data processing.☆7,772Updated this week
- Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3☆14,321Updated this week
- Apache Storm☆6,590Updated last week
- An Open Source Machine Learning Framework for Everyone☆185,505Updated this week
- The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems☆68,472Updated this week
- Apache Pulsar - distributed pub-sub messaging system☆14,117Updated this week
- Deep Learning for humans☆61,612Updated this week
- Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.☆6,374Updated this week
- Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing☆14,283Updated this week
- Apache HBase☆5,199Updated this week
- TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Ch…☆36,894Updated this week
- Apache ZooKeeper☆12,143Updated this week
- Google core libraries for Java☆50,026Updated this week
- Notes talking about the design and implementation of Apache Spark☆5,255Updated 5 months ago
- A library that provides an embeddable, persistent key-value store for fast storage.☆28,269Updated this week