apache/datafusion-comet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apache/datafusion-comet)

apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator

☆1,230

Alternatives and similar repositories for datafusion-comet

Users that are interested in datafusion-comet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apache / auron
View on GitHub
The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query process…
☆1,777Updated this week
apache / datafusion-ballista
View on GitHub
Apache DataFusion Ballista Distributed Query Engine
☆2,089Updated this week
apache / gluten
View on GitHub
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
☆1,575Updated this week
apache / datafusion
View on GitHub
Apache DataFusion SQL Query Engine
☆8,991Updated this week
apache / iceberg-rust
View on GitHub
Apache Iceberg
☆1,343Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
apache / datafusion-ray
View on GitHub
Apache DataFusion Ray
☆230May 15, 2026Updated 2 months ago
substrait-io / substrait
View on GitHub
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
☆1,535Updated this week
apache / datafusion-python
View on GitHub
Apache DataFusion Python Bindings
☆594Updated this week
facebookincubator / velox
View on GitHub
A composable and fully extensible C++ execution engine library for data management systems.
☆4,172Updated this week
lakekeeper / lakekeeper
View on GitHub
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
☆1,391Updated this week
apache / arrow-rs
View on GitHub
Official Rust implementation of Apache Arrow
☆3,528Updated this week
apache / celeborn
View on GitHub
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
☆1,056Updated this week
JanKaul / iceberg-rust
View on GitHub
Unofficial rust implementation of Apache Iceberg with integration for Datafusion
☆241Updated this week
apache / paimon-rust
View on GitHub
Apache Paimon Rust The rust implementation of Apache Paimon.
☆181Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
apache / polaris
View on GitHub
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
☆2,015Updated this week
unitycatalog / unitycatalog
View on GitHub
Open, Multi-modal Catalog for Data & AI
☆3,460Updated this week
delta-io / delta-rs
View on GitHub
A native Rust library for Delta Lake, with bindings into Python
☆3,263Updated this week
apache / hudi-rs
View on GitHub
The native Rust implementation for Apache Hudi, with C++ & Python API bindings.
☆277Jun 26, 2026Updated 3 weeks ago
facebookincubator / nimble
View on GitHub
New and extensible file format for storage of large columnar datasets.
☆728Updated this week
lakehq / sail
View on GitHub
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
☆3,191Updated this week
apache / iceberg
View on GitHub
Apache Iceberg
☆9,059Updated this week
lance-format / lance
View on GitHub
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data ve…
☆6,822Updated this week
datafusion-contrib / ray-sql
View on GitHub
Distributed SQL Query Engine in Python using Ray
☆245Oct 2, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Eventual-Inc / Daft
View on GitHub
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
☆5,638Updated this week
apache / uniffle
View on GitHub
Uniffle is a high performance, general purpose Remote Shuffle Service.
☆451Updated this week
vortex-data / vortex
View on GitHub
An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now…
☆3,090Updated this week
ArroyoSystems / arroyo
View on GitHub
Distributed stream processing engine in Rust
☆4,964Updated this week
NVIDIA / cudf-spark
View on GitHub
NVIDIA cuDF for Apache Spark plugin - accelerate Apache Spark with GPUs
☆989Updated this week
GlareDB / glaredb
View on GitHub
GlareDB: A light and fast SQL database for analytics
☆1,017Nov 14, 2025Updated 8 months ago
projectnessie / nessie
View on GitHub
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
☆1,482Updated this week
risingwavelabs / risingwave
View on GitHub
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real time, at scale.
☆9,173Updated this week
apache / incubator-xtable
View on GitHub
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processin…
☆1,194Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
icelake-io / icelake
View on GitHub
Pure Rust Iceberg Implementation
☆162Aug 13, 2024Updated last year
cmu-db / optd-original
View on GitHub
CMU-DB's Cascades optimizer framework
☆405Jan 6, 2025Updated last year
delta-io / delta
View on GitHub
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Tr…
☆8,915Updated this week
datafusion-contrib / datafusion-dft
View on GitHub
Batteries included CLI, TUI, and server implementations for DataFusion.
☆199Updated this week
apache / kyuubi
View on GitHub
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
☆2,352Updated this week
datafusion-contrib / datafusion-table-providers
View on GitHub
DataFusion TableProviders for reading data from other systems
☆200Jul 7, 2026Updated last week
apache / opendal
View on GitHub
Apache OpenDAL: One Layer, All Storage.
☆5,241Updated this week