linkedin/dr-elephant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/linkedin/dr-elephant)

linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark

☆1,370

Alternatives and similar repositories for dr-elephant

Users that are interested in dr-elephant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated 3 weeks ago
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
airbnb / reair
View on GitHub
ReAir is a collection of easy-to-use tools for replicating tables and partitions between Hive data warehouses.
☆282Feb 27, 2019Updated 7 years ago
JerryLead / SparkInternals
View on GitHub
Notes talking about the design and implementation of Apache Spark
☆5,361Apr 2, 2024Updated 2 years ago
databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
apache / carbondata
View on GitHub
High performance data store solution
☆1,448Jul 4, 2026Updated 2 weeks ago
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,193Updated this week
uber-archive / AthenaX
View on GitHub
SQL-based streaming analytics platform at scale
☆1,224Jun 21, 2020Updated 6 years ago
uber-common / jvm-profiler
View on GitHub
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
☆1,804May 21, 2026Updated 2 months ago
byzer-org / byzer-lang
View on GitHub
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
☆1,835May 29, 2024Updated 2 years ago
cloudera / livy
View on GitHub
Livy is an open source REST interface for interacting with Apache Spark from anywhere
☆1,008Oct 5, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
linkedin / transport
View on GitHub
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆306Jun 29, 2026Updated 3 weeks ago
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,056Updated this week
yaooqinn / spark-authorizer
View on GitHub
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…
☆183Apr 6, 2022Updated 4 years ago
linkedin / dynamometer
View on GitHub
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
☆135Jan 11, 2024Updated 2 years ago
cubefs / compass
View on GitHub
Compass is a task diagnosis platform for bigdata
☆405Nov 23, 2024Updated last year
lw-lin / CoolplaySpark
View on GitHub
酷玩 Spark: Spark 源代码解析、Spark 类库等
☆3,475May 18, 2022Updated 4 years ago
edp963 / wormhole
View on GitHub
Wormhole is a SPaaS (Stream Processing as a Service) Platform
☆975Nov 16, 2022Updated 3 years ago
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆662Updated this week
LucaCanali / sparkMeasure
View on GitHub
This repository contains the development code for sparkMeasure, an Apache Spark performance analysis and troubleshooting library. It simp…
☆827May 19, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
apache / kylin
View on GitHub
Apache Kylin
☆3,769Updated this week
Intel-bigdata / HiBench
View on GitHub
HiBench is a big data benchmark suite.
☆1,485Dec 15, 2025Updated 7 months ago
apache / pinot
View on GitHub
Apache Pinot - A realtime distributed OLAP datastore
☆6,117Updated this week
ExpediaGroup / waggle-dance
View on GitHub
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
☆288Jun 25, 2026Updated 3 weeks ago
apache / eagle
View on GitHub
Mirror of Apache Eagle
☆411Aug 22, 2020Updated 5 years ago
datahub-project / datahub
View on GitHub
The Context Platform for your Data and AI Stack
☆12,315Updated this week
Tencent / Firestorm
View on GitHub
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shu…
☆256Apr 7, 2023Updated 3 years ago
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
linkedin / coral
View on GitHub
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
☆907Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
apache / linkis
View on GitHub
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications…
☆3,406Updated this week
apache / zeppelin
View on GitHub
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
☆6,645Updated this week
japila-books / apache-spark-internals
View on GitHub
The Internals of Apache Spark
☆1,547Updated this week
Alluxio / alluxio
View on GitHub
Alluxio, data orchestration for analytics and machine learning in the cloud
☆7,214Apr 29, 2025Updated last year
ColZer / DigAndBuried
View on GitHub
挖坑与填坑
☆684Aug 18, 2016Updated 9 years ago
apache / ambari
View on GitHub
Apache Ambari simplifies provisioning, managing, and monitoring of Apache Hadoop clusters.
☆2,306Updated this week
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,924Updated this week