Netflix/iceberg

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Netflix/iceberg)

Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data

☆494

Alternatives and similar repositories for iceberg

Users that are interested in iceberg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rdblue / s3committer
View on GitHub
Hadoop output committers for S3
☆114Jul 9, 2020Updated 6 years ago
apache / iceberg
View on GitHub
Apache Iceberg
☆9,078Updated this week
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,925Updated this week
Netflix / metacat
View on GitHub
☆1,687Jul 16, 2026Updated last week
starburstdata / facebook-presto
View on GitHub
Starburst Enterprise Distribution of Presto
☆45Aug 31, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
zrlio / albis
View on GitHub
Albis: High-Performance File Format for Big Data Systems
☆21Jul 12, 2018Updated 8 years ago
lyft / presto-gateway
View on GitHub
A load balancer / proxy / gateway for prestodb
☆359Jul 25, 2024Updated last year
linkedin / transport
View on GitHub
A framework for writing performant user-defined functions (UDFs) that are portable across a variety of engines including Apache Spark, Ap…
☆306Jun 29, 2026Updated 3 weeks ago
microsoft / hyperspace
View on GitHub
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
☆430Jan 14, 2022Updated 4 years ago
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,194Updated this week
Netflix / genie
View on GitHub
Distributed Big Data Orchestration Service
☆1,764Jul 13, 2026Updated last week
linkedin / coral
View on GitHub
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
☆907Updated this week
qubole / spark-acid
View on GitHub
ACID Data Source for Apache Spark based on Hive ACID
☆97Jul 7, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MarquezProject / marquez
View on GitHub
Collect, aggregate, and visualize a data ecosystem's metadata
☆2,248Updated this week
awslabs / aws-glue-data-catalog-client-for-apache-hive-metastore
View on GitHub
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog a…
☆230May 18, 2026Updated 2 months ago
paypal / gimel
View on GitHub
Big Data Processing Framework - Unified Data API or SQL on Any Storage
☆252Jul 10, 2025Updated last year
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
apache / livy
View on GitHub
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
☆958Jul 9, 2026Updated 2 weeks ago
apache / pinot
View on GitHub
Apache Pinot - A realtime distributed OLAP datastore
☆6,117Updated this week
AbsaOSS / spline
View on GitHub
Data Lineage Tracking And Visualization Solution
☆663Updated this week
spotify / scio
View on GitHub
A Scala API for Apache Beam and Google Cloud Dataflow.
☆2,625Updated this week
prestodb / presto
View on GitHub
The official home of the Presto distributed SQL query engine for big data
☆16,720Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated last month
apache / parquet-java
View on GitHub
Apache Parquet Java
☆3,069Updated this week
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,949Updated this week
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
qubole / streaminglens
View on GitHub
Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
☆17Jan 21, 2020Updated 6 years ago
uber-common / jvm-profiler
View on GitHub
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
☆1,804May 21, 2026Updated 2 months ago
holdenk / spark-testing-base
View on GitHub
Base classes to use when writing tests with Spark
☆1,553Apr 20, 2026Updated 3 months ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
apache / bahir
View on GitHub
Mirror of Apache Bahir
☆336Jul 7, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,636Updated this week
OpenLineage / OpenLineage
View on GitHub
An Open Standard for lineage metadata collection
☆2,560Updated this week
uber / marmaray
View on GitHub
Generic Data Ingestion & Dispersal Library for Hadoop
☆483Mar 19, 2023Updated 3 years ago
uber / RemoteShuffleService
View on GitHub
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
☆335Sep 29, 2023Updated 2 years ago
trinodb / trino
View on GitHub
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
☆13,068Updated this week
linkedin / dynamometer
View on GitHub
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
☆135Jan 11, 2024Updated 2 years ago
apache / incubator-toree
View on GitHub
Mirror of Apache Toree (Incubating)
☆751Jul 17, 2026Updated last week