Smart Storage Management for Big Data, a comprehensive hot/cold data optimized solution
☆140Jan 3, 2023Updated 3 years ago
Alternatives and similar repositories for SSM
Users that are interested in SSM are comparing it to the libraries listed below
Sorting:
- A tool for scale and performance testing of HDFS with a specific focus on the NameNode.☆134Jan 11, 2024Updated 2 years ago
- 项目中保留了向开源社区提交过的patch☆16Oct 22, 2017Updated 8 years ago
- Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.☆1,516Updated this week
- 已经合入(apache/incubator-kyuubi) ACL Management for Apache Spark SQL with Apache Ranger.☆58Nov 11, 2021Updated 4 years ago
- Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.☆1,039Updated this week
- Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…☆257Apr 7, 2023Updated 2 years ago
- Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark☆1,371Aug 22, 2023Updated 2 years ago
- Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.☆284Feb 18, 2026Updated last week
- A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…☆183Apr 6, 2022Updated 3 years ago
- hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage☆94Dec 23, 2025Updated 2 months ago
- [Archived] A Fast Multi-tiered Distributed Storage System based on User-Level I/O☆74Mar 2, 2018Updated 7 years ago
- Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.☆2,304Updated this week
- A re-implementation of Hadoop DistCP in Apache Spark☆47Dec 20, 2023Updated 2 years ago
- ☆393Jan 25, 2024Updated 2 years ago
- MemEC: An Erasure-Coding-Based Distributed In-Memory Key-Value Store☆11Mar 30, 2017Updated 8 years ago
- Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange☆130Dec 19, 2024Updated last year
- Compass is a task diagnosis platform for bigdata☆405Nov 23, 2024Updated last year
- Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.☆1,171Updated this week
- Hadoop filesystem implementation for Aliyun OSS☆13Feb 14, 2016Updated 10 years ago
- Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.☆321Jan 22, 2026Updated last month
- NameNodeAnalytics is a self-help utility for scouting and maintaining the namespace of an HDFS instance.☆120Nov 25, 2025Updated 3 months ago
- This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvid…☆257May 13, 2019Updated 6 years ago
- Performance Analysis Tool☆78Nov 25, 2025Updated 3 months ago
- High performance data store solution☆1,446Feb 21, 2026Updated last week
- ☆15Aug 8, 2023Updated 2 years ago
- A hadoop compatible FUSE use for all.☆29Sep 25, 2024Updated last year
- Uniffle is a high performance, general purpose Remote Shuffle Service.☆446Feb 12, 2026Updated 2 weeks ago
- Scalable NameNode RPC Proxy for HDFS Federation☆87Apr 19, 2016Updated 9 years ago
- Mirror of Apache crail (Incubating)☆151Jul 3, 2022Updated 3 years ago
- Giraffa FileSystem (Slack: giraffa-fs.slack.com)☆18Mar 8, 2017Updated 8 years ago
- HiBench is a big data benchmark suite.☆1,489Dec 15, 2025Updated 2 months ago
- A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…☆303Oct 30, 2025Updated 4 months ago
- Spark Connector to read and write with Pulsar☆118Updated this week
- cephfs-hadoop☆58Dec 10, 2020Updated 5 years ago
- Alluxio, data orchestration for analytics and machine learning in the cloud☆7,157Apr 29, 2025Updated 10 months ago
- Demonstration of a Hive Input Format for Iceberg☆26Mar 12, 2021Updated 4 years ago
- Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.☆257Feb 21, 2023Updated 3 years ago
- Upserts, Deletes And Incremental Processing on Big Data.☆6,098Updated this week
- Tool for gathering blocks and replicas meta data from HDFS. It also builds a heat map showing how replicas are distributed along disks an…☆55May 9, 2017Updated 8 years ago