delta-io/delta

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/delta-io/delta)

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

☆8,924

Alternatives and similar repositories for delta

Users that are interested in delta are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / iceberg
View on GitHub
Apache Iceberg
☆9,070Updated this week
apache / hudi
View on GitHub
Upserts, Deletes And Incremental Processing on Big Data.
☆6,192Updated this week
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,371Mar 20, 2024Updated 2 years ago
apache / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆43,670Updated this week
trinodb / trino
View on GitHub
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
☆13,061Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
delta-io / delta-sharing
View on GitHub
An open protocol for secure data sharing
☆952Updated this week
delta-io / delta-rs
View on GitHub
A native Rust library for Delta Lake, with bindings into Python
☆3,267Updated this week
unitycatalog / unitycatalog
View on GitHub
Open, Multi-modal Catalog for Data & AI
☆3,464Updated this week
prestodb / presto
View on GitHub
The official home of the Presto distributed SQL query engine for big data
☆16,719Updated this week
apache / datafusion
View on GitHub
Apache DataFusion SQL Query Engine
☆9,005Updated this week
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,947Updated this week
apache / flink
View on GitHub
Apache Flink
☆26,202Updated this week
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,353Updated this week
datahub-project / datahub
View on GitHub
The Context Platform for your Data and AI Stack
☆12,320Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,576Updated this week
apache / pinot
View on GitHub
Apache Pinot - A realtime distributed OLAP datastore
☆6,117Updated this week
awslabs / deequ
View on GitHub
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
☆3,636Updated this week
dbt-labs / dbt-core
View on GitHub
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build application…
☆13,495Updated this week
apache / doris
View on GitHub
Apache Doris is a real-time analytics and hybrid search database for AI agents.
☆15,655Updated this week
mlflow / mlflow
View on GitHub
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, a…
☆27,161Updated this week
apache / airflow
View on GitHub
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
☆46,196Updated this week
duckdb / duckdb
View on GitHub
DuckDB is an analytical in-process SQL database management system
☆39,618Updated this week
projectnessie / nessie
View on GitHub
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
☆1,481Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
facebookincubator / velox
View on GitHub
A composable and fully extensible C++ execution engine library for data management systems.
☆4,176Updated this week
amundsen-io / amundsen
View on GitHub
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting…
☆4,782Jul 1, 2026Updated 3 weeks ago
ClickHouse / ClickHouse
View on GitHub
ClickHouse® is a real-time analytics database management system
☆48,788Updated this week
apache / calcite
View on GitHub
Apache Calcite
☆5,160Updated this week
airbytehq / airbyte
View on GitHub
Open-source data movement for ELT pipelines and AI agents — from APIs, databases & files to warehouses, lakes, and AI applications. Both …
☆21,670Updated this week
debezium / debezium
View on GitHub
Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.
☆12,937Updated this week
Alluxio / alluxio
View on GitHub
Alluxio, data orchestration for analytics and machine learning in the cloud
☆7,213Apr 29, 2025Updated last year
apache / paimon
View on GitHub
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch …
☆3,346Updated this week
apache / druid
View on GitHub
Apache Druid: a high performance real-time analytics database.
☆14,034Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
StarRocks / starrocks
View on GitHub
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly…
☆11,922Updated this week
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,836Updated this week
apache / beam
View on GitHub
Apache Beam is a unified programming model for Batch and Streaming data processing.
☆8,636Updated this week
treeverse / lakeFS
View on GitHub
lakeFS - Data version control for your data lake | Git for data
☆5,460Updated this week
dagster-io / dagster
View on GitHub
An orchestration platform for the development, production, and observation of data assets.
☆15,881Updated this week
databendlabs / databend
View on GitHub
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified architecture on your S3.
☆9,389Updated this week
fivetran / great_expectations
View on GitHub
Always know what to expect from your data.
☆11,664Updated this week