ercoppa/HadoopInternals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ercoppa/HadoopInternals)

ercoppa / HadoopInternals

Diagrams describing Apache Hadoop internals (2.3.0 or later).

☆428

Alternatives and similar repositories for HadoopInternals

Users that are interested in HadoopInternals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JerryLead / SparkInternals
View on GitHub
Notes talking about the design and implementation of Apache Spark
☆5,361Apr 2, 2024Updated 2 years ago
tmalaska / HBase.MCC
View on GitHub
HBase.MCC (HBase Multi Cluster Client). The goal is to support aways up solutions with HBase through multiple clusters
☆14Nov 9, 2015Updated 10 years ago
ParallelProcessingLab / fogfaas
View on GitHub
FoGFaaS: Add serverless computing (faas) to ifogsim
☆22Mar 30, 2025Updated last year
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
steveloughran / kerberos_and_hadoop
View on GitHub
Kerberos and Hadoop: The Madness beyond the Gate
☆283Jul 28, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
romainr / hadoop-tutorials-examples
View on GitHub
Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.
☆231Feb 6, 2017Updated 9 years ago
ColZer / DigAndBuried
View on GitHub
挖坑与填坑
☆684Aug 18, 2016Updated 9 years ago
japila-books / apache-spark-internals
View on GitHub
The Internals of Apache Spark
☆1,547Updated this week
tmalaska / SparkOnALog
View on GitHub
Examples of Integrating Spark Streaming, Flume, and HBase to solve Streaming problems
☆18Feb 27, 2014Updated 12 years ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated 3 weeks ago
hadooparchitecturebook / hadoop-arch-book
View on GitHub
Code repository for O'Reilly Hadoop Application Architectures book
☆160May 26, 2015Updated 11 years ago
abajwa-hw / security-workshops
View on GitHub
Workshops on how to setup security on Hadoop using HDP sandboxes
☆99Apr 11, 2018Updated 8 years ago
TritonDataCenter / binder
View on GitHub
Triton/Manta DNS server over Apache Zookeeper
☆25Jul 8, 2026Updated last week
jaceklaskowski / kafka-notebook
View on GitHub
The Internals of Apache Kafka
☆132Aug 29, 2022Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
apache / incubator-retired-slider
View on GitHub
Mirror of Apache Slider
☆79Dec 11, 2018Updated 7 years ago
tripadvisor / hive-query-tool
View on GitHub
A web interface to Hive with flexible, user-friendly query customization
☆24Jun 30, 2013Updated 13 years ago
kite-sdk / kite
View on GitHub
Kite SDK
☆393Nov 1, 2022Updated 3 years ago
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
lw-lin / CoolplaySpark
View on GitHub
酷玩 Spark: Spark 源代码解析、Spark 类库等
☆3,475May 18, 2022Updated 4 years ago
hadooparchitecturebook / clickstream-tutorial
View on GitHub
Code for Tutorial on designing clickstream analytics application using Hadoop
☆54May 20, 2015Updated 11 years ago
mkuthan / example-spark-kafka
View on GitHub
Apache Spark and Apache Kafka integration example
☆122Dec 21, 2017Updated 8 years ago
hadooparchitecturebook / SparkStreaming.Sessionization
View on GitHub
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBase
☆16Oct 31, 2014Updated 11 years ago
LinkedInAttic / camus
View on GitHub
LinkedIn's previous generation Kafka to HDFS pipeline.
☆881Aug 27, 2020Updated 5 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
quantiply / grafana-druid-wikipedia
View on GitHub
Example using Grafana with Druid
☆11Mar 27, 2015Updated 11 years ago
Intel-bigdata / HiBench
View on GitHub
HiBench is a big data benchmark suite.
☆1,485Dec 15, 2025Updated 7 months ago
cloudera / mapreduce-tutorial
View on GitHub
☆36Mar 31, 2017Updated 9 years ago
rstml / datastax-spark-streaming-demo
View on GitHub
Counting Twitter hashtags using Spark Streaming and Cassandra
☆41Feb 16, 2015Updated 11 years ago
lw-lin / streaming-readings
View on GitHub
Streaming System 相关的论文读物
☆734Feb 12, 2022Updated 4 years ago
sheetaldolas / Hive-JSON-Serde
View on GitHub
Read - Write JSON SerDe for Apache Hive.
☆21Dec 4, 2018Updated 7 years ago
apache / hadoop
View on GitHub
Apache Hadoop
☆15,614Updated this week
Huawei-Spark / Backup-Repo
View on GitHub
The released version of Astro(Spark SQL on HBase) has been moved to:
☆16Jul 23, 2015Updated 10 years ago
japila-books / spark-sql-internals
View on GitHub
The Internals of Spark SQL
☆487Jan 25, 2026Updated 5 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
databricks / spark-sql-perf
View on GitHub
☆623Feb 26, 2022Updated 4 years ago
apache / orc
View on GitHub
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
☆767Updated this week
spark-notebook / spark-notebook
View on GitHub
Interactive and Reactive Data Science using Scala and Spark.
☆3,142May 16, 2023Updated 3 years ago
yaooqinn / spark-authorizer
View on GitHub
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark | This repo is contributed to Apache Kyuubi | 项目已迁移至 Apa…
☆183Apr 6, 2022Updated 4 years ago
buoyant-data / verspaetung
View on GitHub
Verspätung is a small utility which aims to help identify delay of Kafka consumers
☆11Aug 4, 2016Updated 9 years ago
mahmoudparsian / data-algorithms-book
View on GitHub
MapReduce, Spark, Java, and Scala for Data Algorithms Book
☆1,081Oct 14, 2024Updated last year
jprante / elasticsearch-functionscore-conditionalboost
View on GitHub
Boost documents in Elasticsearch when they match dynamic conditions
☆18Jul 2, 2016Updated 10 years ago